了解Database Replay Capture内部原理

Database Replay是11g中很酷的特性,对于workload capture的内部工作原理大家理解的不多,这里就介绍一下。

对于Workload Capture需要考虑的因素:

  • 负载捕获文件消耗定量的磁盘空间,这些捕获文件是2进程文件,无法直接阅读,有测试表明在大并发量的OLTP环境中可以达到capture 10分钟占用1G磁盘空间
  • 数据库重启:
    • 可能是保证捕获所有事务的可靠重放的唯一路径
      • 使用startup restrict启动实例,避免不完整的事务捕获
      • 启动capture会取消restrict模式
    • 基于负载类型重启不是必要的
  • 为重放目的恢复数据库的多种方法:
    • 基于scn或时间的物理恢复
    • 逻辑恢复应用数据
    • 闪回或者快照数据
  • Capture可以指定过滤器作为捕获workload子集的方法
  • 需要的权限包括SYSDBA、SYSOPER和合适的OS权限
  • 性能消耗:
    • 在TPCC测试中capture的性能损耗为4.5%
    • 对于每个session会多消耗64KB的内存
    • 必要的Workload Capture耗费文件系统磁盘空间

 

 

这里需要注意的有2点, 对于RAC集群存在workload capture  file的目录必须位于共享文件系统上,否则start_capture时会报错。

对于每个session会多消耗64KB的内存,这是由于本质上写出负载信息到workload  capture file的同样是Server Process服务进程本身,但是这种写出并非在parse解析或execution执行阶段,Server Process将其LOGON、LOGOFF、SQL执行等信息记录存放在PGA中,主要是WCR Capture PG、WCR Capture PGA中,当PGA中的工作负载历史记录达到一定数目时,Server Process本身负责写出这些数据到WCR文件中,在写出这些WCR文件时Server Process进入’WCR: capture file IO write’等待事件。

与WCR相关的等待事件另有:

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select name  from v$event_name where name like '%WCR%';

NAME
----------------------------------------------------------------
WCR: replay client notify
WCR: replay clock
WCR: replay lock order
WCR: replay paused
WCR: RAC message context busy
WCR: capture file IO write
WCR: Sync context busy
latch: WCR: sync
latch: WCR: processes HT

11g中还多出了部分为WCR而设的LATCH 

  1* select name,gets from v$latch where name like '%WCR%'
SQL> /

NAME                                 GETS
------------------------------ ----------
WCR: kecu cas mem                       3
WCR: kecr File Count                   37
WCR: MMON Create dir                    1
WCR: ticker cache                       0
WCR: sync                             495
WCR: processes HT                       0
WCR: MTS VC queue                       0

7 rows selected.

 

 

我们通过如下演示具体说明Database Replay Capture的内部原理

 

 

1. 首先打开capture dbms_workload_capture.start_capture

 CREATE OR REPLACE DIRECTORY dbcapture AS '/home/oracle/dbcapture';

execute dbms_workload_capture.start_capture('CAPTURE','DBCAPTURE',default_action=>'INCLUDE');

SQL> select id,name,status,start_time,end_time,connects,user_calls,dir_path from 
dba_workload_captures where id = (select max(id) from dba_workload_captures) ;

        ID
----------
NAME
--------------------------------------------------------------------------------
STATUS                                   START_TIM END_TIME    CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
         1
CAPTURE
IN PROGRESS                              08-DEC-12                   11

        ID
----------
NAME
--------------------------------------------------------------------------------
STATUS                                   START_TIM END_TIME    CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
       167
/home/oracle/dbcapture

2. 窥探 capture file目录

[oracle@mlab2 dbcapture]$ ls -lR
.:
total 8
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 cap
drwxr-xr-x 3 oracle oinstall 4096 Dec  8 07:24 capfiles
-rw-r--r-- 1 oracle oinstall    0 Dec  8 07:24 wcr_cap_00001.start

./cap:
total 4
-rw-r--r-- 1 oracle oinstall 91 Dec  8 07:24 wcr_scapture.wmd

./capfiles:
total 4
drwxr-xr-x 12 oracle oinstall 4096 Dec  8 07:24 inst1

./capfiles/inst1:
total 40
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 08:31 aa
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ab
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ac
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ad
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ae
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 af
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ag
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ah
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ai
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 aj

./capfiles/inst1/aa:
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec

./capfiles/inst1/ab:
total 0

./capfiles/inst1/ac:
total 0

./capfiles/inst1/ad:
total 0

./capfiles/inst1/ae:
total 0

./capfiles/inst1/af:
total 0

./capfiles/inst1/ag:
total 0

./capfiles/inst1/ah:
total 0

./capfiles/inst1/ai:
total 0

./capfiles/inst1/aj:
total 0

[oracle@mlab2 dbcapture]$ cd ./capfiles/inst1/aa
[oracle@mlab2 aa]$ ls -l
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec
[oracle@mlab2 aa]$ ls -l |wc -l
14

以上负载目录共14个文件

3. 我们LOGON一个新的Server Process

[oracle@mlab2 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Dec 8 08:37:40 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

会多出一个wcr文件

[oracle@mlab2 aa]$ ls -ltr
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:37 wcr_c6hp4h0000018.rec

但是新出现的wcr_c6hp4h0000018.rec 是空的

SQL> select spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

SPID
------------------------
14293

该新的服务进程的操作系统进程号为14293, 可以看到该进程的打开文件描述符,其中就包含了wcr_c6hp4h0000018.rec

[oracle@mlab2 ~]$ ls -l /proc/14293/fd
total 0
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 0 -> /dev/null
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 1 -> /dev/null
lrwx------ 1 oracle oinstall 64 Dec  8 08:39 10 -> /u01/app/oracle/product/11201/db_1/rdbms/audit/CRMV_ora_14293_1.aud
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 11 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 12 -> pipe:[34585895]
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 13 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trm
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 2 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 5 -> /u01/app/oracle/product/11201/db_1/rdbms/mesg/oraus.msb
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 6 -> /proc/14293/fd
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 7 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Dec  8 08:39 8 -> /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 9 -> pipe:[34585894]

也可以通过lsof查到

[root@mlab2 ~]# lsof|grep wcr_c6hp4h0000018.rec
oracle    14293    oracle    8u      REG                8,1          0   17629644 /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec

可以得出一个结论,一个Server Process对应一个WCR REC文件,当Server Process LOGON时生成该文件

3.执行少量SQL语句:

SQL> select 1 from dual;

         1
----------
         1

SQL> /

         1
----------
         1

[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec

==》执行少量SQL无任何时, 不写出任何数据

执行某个超长SQL语句后,可以在下面看到其文件内容包含该WCR文件的版本号,LOGON的登录信息和所执行过的SQL语句历史,但是不包含执行计划

[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
11.2.0.3.0
*File header info. (Shadow process='14293')
 D0576B5D710A34F4E043B201A8C0ECFE
SYS;
NLS_LANGUAGE?
AMERICAN>
NLS_TERRITORY?
AMERICA>
NLS_CURRENCY?
NLS_ISO_CURRENCY?
AMERICA>
NLS_NUMERIC_CHARACTERS?
NLS_CALENDAR?   GREGORIAN>
NLS_DATE_FORMAT?        DD-MON-RR>
NLS_DATE_LANGUAGE?
AMERICAN>
NLS_CHARACTERSET?
AL32UTF8>
NLS_SORT?
BINARY>
NLS_TIME_FORMAT?
HH.MI.SSXFF AM>
NLS_TIMESTAMP_FORMAT?
DD-MON-RR HH.MI.SSXFF AM>
NLS_TIME_TZ_FORMAT?
HH.MI.SSXFF AM TZR>
NLS_TIMESTAMP_TZ_FORMAT?
DD-MON-RR HH.MI.SSXFF AM TZR>
NLS_DUAL_CURRENCY?
NLS_SPECIAL_CHARS?
NLS_NCHAR_CHARACTERSET?
UTF8>
NLS_COMP?
BINARY>
NLS_LENGTH_SEMANTICS?
BYTE>
NLS_NCHAR_CONV_EXCP?
FALSE
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11201/db_1/bin/oracle)(ARGV0=oracleCRMV)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(DETACH=NO))(CONNECT_DATA=(CID=(PROGRAM=sqlplus)(HOST=mlab2.oracle.com)(USER=oracle))))
,T$sqlplus@mlab2.oracle.com (TNS V1-V3)U
        tselect spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat))
        `       _
select 1 from dual
select 1 from dual

执行某些错误语句时甚至会记录其错误信息
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec

9`9_^B
create table vva(t1 int)
`:_i
:`:_iB
`;_^
;`;_^B
create table vva(t1 int)
`_i
>`>_iB
FusC
`?_^
?`?_^B
FvWC
        _begin
for i in 1..50000 loop
execute immediate 'select 1 from dual where 2='||i;
end loop;
end;

若SERVER PROCESS LOGOFF 则会有如下信息

C`E_    B
k^2C

以上说明Server Process并不胡在parse或execution阶段写WCR文件,而是将这些数据存放在PGA中,达到一定大小阀值后才写出,所以其性能影响小,但是如果在WCR数据写出前就意外终止则,这些数据将丢失。

4. 窥视进程信息

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc

从processstate 文件中可以发现历史等待信息包括 WCR: capture file IO write,这是Server process在写WCR 文件

     3: waited for 'SQL*Net message to client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=139 seq_num=140 snap_id=1
        wait times: snap=0.000007 sec, exc=0.000007 sec, total=0.000007 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.934091 sec of elapsed time
     4: waited for 'latch: shared pool'
        address=0x60106b20, number=0x133, tries=0x0
        wait_id=138 seq_num=139 snap_id=1
        wait times: snap=0.000066 sec, exc=0.000066 sec, total=0.000066 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 1.180690 sec of elapsed time
     5: waited for 'WCR: capture file IO write'
        =0x0, =0x0, =0x0
        wait_id=137 seq_num=138 snap_id=1
        wait times: snap=0.000189 sec, exc=0.000189 sec, total=0.000189 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 3.122783 sec of elapsed time
     6: waited for 'WCR: capture file IO write'
        =0x0, =0x0, =0x0
        wait_id=136 seq_num=137 snap_id=1
        wait times: snap=0.000191 sec, exc=0.000191 sec, total=0.000191 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 3.053132 sec of elapsed time
     7: waited for 'WCR: capture file IO write'

5.窥视PGA内存信息

SQL> oradebug dump heapdump 536870917;
Statement processed.

grep WCR /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc

  Chunk     7fb1b606bfc0 sz=    65600    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6111e18 sz=     4224    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6112e98 sz=     4184    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6113ef0 sz=     4224    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6114f70 sz=     4104    recreate  "WCR Capture PG "  latch=(nil)
  Chunk     7fb1b6115f78 sz=      160    freeable  "WCR Capture PGA"
  Chunk     7fb1b6116018 sz=     3248    freeable  "WCR Capture PGA"
  Subheap ds=0x7fb1b6115f90  heap name=  WCR Capture PG  size=           82336
HEAP DUMP heap name="WCR Capture PG"  desc=0x7fb1b6115f90
FIVE LARGEST SUB HEAPS for heap name="WCR Capture PG"   desc=0x7fb1b6115f9

PGA中存在WCR Capture PG 和WCR Capture PGA的freeable or recreate内存chunk,这些内存可以从Server Process返回给OS

  Chunk     7fb1b606bfc0 sz=    65600    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90

sz=    65600=》 64k 这可能就是文档描述的64k内存,但是实际可以看到使用总量超过了64k

WCR是否可能引起内存泄露呢?:)!

6.隐藏参数

控制WCR CAPTURE的参数至少有以下2个

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm in ('_capture_buffer_size','_wcr_control');

NAME                 VALUE                DESCRIB
-------------------- -------------------- ------------------------------------------------------------
_wcr_control         0                    Oracle internal test WCR parameter used ONLY for testing!
_capture_buffer_size 65536                To set the size of the PGA I/O recording buffers

其中_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
_wcr_control 控制WCR的内部行为,具体尚不清晰

7.结束capture 

SQL> execute dbms_workload_capture.finish_capture;

PL/SQL procedure successfully completed.

DBMS_WORKLOAD_CAPTURE.START_CAPTURE(): Starting database capture at 12/08/2012 07:24:54
DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE(): Stopped database capture (not all sessions  could flush their capture buffers) at 12/08/2012 10:29:49

 

 

综上所述,我们得出结论:

1. 负责写WCR WORKLOAD CAPTURE文件的不是什么后台进程,而是Server Process服务进程(前台进程)
2. 每一个server process对应一个WCR文件
3. Server Proess会写出LOGON、LOGOFF、SQL执行以及错误信息到WCR文件中
4. Server Process的写不是立即模式Immediate mode,而是将数据存放在PGA的(WCR Capture) subheap中,当数据达到一定大小或一定时间(timeout才写出一次)
5. 再次强调, Server Process的写不是立即模式Immediate mode,即capture不发生在parse或者execution阶段(很多人认为capture发生在parse时这种说法可以证明为不正确,parse并不受到capture的影响),而仅仅是将LOGON、SQL历史(不包括执行计划)放在PGA的WCR Capture内存子堆中,条件触发才写出,所以其性能损耗小,按照tpcc的测试结论为4.5%
6. 隐藏参数_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
7. WCR Capture文件虽然是binrary 2进制文件,但并未加密,所以第三方软件理论上也可以读取该WCR capture file
8. WCR: capture file IO write等待事件是Server Process在写WCR文件
9. 执行dbms_workload_capture.finish_capture;会让现有Server Process FLUSH WCR buffer,但是就ALERT.LOG的自解释信息看”Stopped database capture (not all sessions  could flush their capture buffers)”,这种flush不是必然能回收内存的 

 

Real Application Testing Database Replay、SPA的价格和介绍

Real Application Testing = SPA + Database Replay

SPA和 Database Replay的区别:

Database Replay好比是给鸭梨山大的锤子:

 

SPA: SQL focused and unit testing SQL component
Database Replay: Workload focused, comprehensive testing of database stack

Database Replay
Load testing solution for performance and scalability testing
“Capture” entire workload (queries, DML, DDL, PL/SQL, etc.) in production, including concurrency
“Replay” entire workload in test with exact production characteristics including concurrency
Ideal for system upgrades, configuration changes (SI to RAC), storage changes, etc.

 

SPA还比是好用的起钉锤:

SQL Performance Analyzer
Solution for identifying performance regressions/improvements resulting from execution plan changes
“Capture” SQL queries from production, including executions plans & stats
“Test-Execute” SQL queries serially “before” and “after” changes
Compare “before” and “after” SQL execution plans and performance stats
Ideal for changes impacting query plans such as db upgrades, optimizer statistics refresh, new index creation, etc.

 

Real Application Testing RAT作为DBEE(企业版)数据库的选项来销售, 单独使用SPA或Database Replay均要求购买完整的RAT使用许可证。

RAT的具体价格为$11,500/cpu ($230/NUP Named User Plus),无论是产品还是测试数据库。

注意当从9iR2或10gR1升级到10gR2时,SPA要求使用在9iR2或10gR1的产品数据库上使用SQL TRACE功能以便捕获SQL语句。仅在此场景中,RAT license仅对测试系统有效而非产品系统。

 

SPA Applicable for Pre-11g Database Releases

 

SQL Performance Analyzer
Capture on 9i, 10.1, 10.2 database releases
Test changes in 10.2 & above
Database Replay
Capture on 9i, 10.2 database releases
Test changes in 11.1 & above

 

11g Real Application Testing:Database Replay使用方法

.  Database Replay使用方法

 

1.1       捕获性能负载

 

1. 针对需要捕获性能负载时段执行如下PL/SQL脚本:

 

   execute dbms_workload_capture.start_capture(‘&CAPTURE_NAME’,’&DIRECT_NAME’,default_action=>’INCLUDE’); 

CAPTURE_NAME=> 本次capture的名字

可以通过 DBA_WORKLOAD_CAPTURES 视图监控

 

DIRECTORY_NAME=> ORACLE目录对象名,该目录用以存放catpure文件,现有测试表明在繁忙的OTLP数据中收集10分钟数据消耗磁盘空间1GB,建议为该目录分配足够的磁盘空间

 

 

 

 

 

 

2. 可以通过如下SQL监控capture的情况

 

查询1:select id,name,status,start_time,end_time,connects,user_calls,dir_path from dba_workload_captures where id = (select max(id) from dba_workload_captures) ; 

set pagesize 0 long 30000000 longchunksize 1000

select dbms_workload_capture.report(&ID,’TEXT’) from dual;

 

其中ID是查询1获得的ID值

 

 

 

3. 当不再需要捕获更多负载时通过如下脚本结束capture:

 

execute dbms_workload_capture.finish_capture; 

 

 

 

 

 

1.2       预处理捕获

1. 将捕获到的capture file传输到目标数据库主机上,并创建必要的Oracle Directory 对象

 

 

 

2.  在目标数据库预处理capture file

 

execute DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(‘&DIRECTORY_NAME’); 

注意 Capture Preprocess By Process_capture Can Not Complete [ID 1265686.1]

Bug 9742032  Database replay: dbms_workload_replay.process_capture takes a lot of time

 

 

— ***********************************************************

—  PROCESS_CAPTURE

—    Processes the workload capture found in capture_dir in place.

—    Analyzes the workload capture found in the capture_dir and

—    creates new workload replay specific metadata files that are

—    required to replay the given workload capture.

—    This procedure can be run multiple times on the same

—    capture directory – useful when this procedure encounters

—    unexpected errors or is cancelled by the user.

—    Once this procedure runs successfully, the capture_dir can be used

—    as input to INITIALIZE_REPLAY() in order to replay the captured

—    workload present in capture_dir.

—    Before a workload capture can be replayed in a particular database

—    version, the capture needs to be “processed” using this

—    PROCESS_CAPTURE procedure in that same database version.

—    Once created, a processed workload capture can be used to replay

—    the captured workload multiple times in the same database version.

—    For example:

—      Say workload “foo” was captured in “rec_dir” in Oracle

—      database version 10.2.0.4

—      In order to replay the workload “foo” in version 11.1.0.1

—      the workload needs to be processed in version 11.1.0.1

—      The following procedure needs to be executed in a 11.1.0.1 database

—      in order to process the capture directory “rec_dir”

—        DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(‘rec_dir’);

—      Now, rec_dir contains a valid 11.1.0.1 processed workload capture

—      that can be used to replay the workload “foo” in 11.1.0.1 databases

—      as many number of times as required.

—    The procedure will take as input the following parameters:

—      capture_dir – name of the workload capture directory object.

—                    (case sensitive)

—                    The directory object must point to a valid OS

—                    directory that has appropriate permissions.

—                    New files will be added to this directory.

—                    (MANDATORY)

—      parallel_level – number of oracle processes used to process the

—                       capture in a parallel fashion.

—                       The NULL default value will auto-compute the

—                       parallelism level, whereas a value of 1 will enforce

—                       serial execution.

 

 

 

3. 以上预处理可能因为bug:8919603

 

 

 

 

1.3       开始REPLAY重放

 

通过 wrc工具的calibrate模式评估需要多少个客户端机

 

 

wrc mode=calibrate replaydir=$REPLAY_DIR$REPLAY_DIR指定预处理过的目录

 

Recommendation:

Consider using at least 75 clients divided among 19 CPU(s)

You will need at least 142 MB of memory per client process.

If your machine(s) cannot match that number, consider using more clients.

 

Workload Characteristics:

– max concurrency: 2830 sessions

– total number of sessions: 70362

 

Assumptions:

– 1 client process per 50 concurrent sessions

– 4 client process per CPU

– 256 KB of memory cache per concurrent session

– think time scale = 100

– connect time scale = 100

– synchronization = TRUE

 

 

准备数据库环境,将数据库闪回到capture时间点并创建用户:

 

shutdown abort;      ==》关闭2个实例startup mount;        ==》 启动一个实例到mountflashback database to restore point prereplay;

alter database open resetlogs;

startup;               ==>启动另一个实例

 

create user orasupport identified by oracle;

grant dba to orasupport;

 

 

 

 

 

执行如下脚本准备capture:

 

 

exec DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY(replay_name =>’&REPLAY_NAME’,replay_dir => ‘&REPLAY_DIR’); 

 

exec DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY(synchronization=>false,

connect_time_scale=>80,think_time_scale=>25, SCALE_UP_MULTIPLIER=>1);

 

 

synchronization=》指定了commit order是否要求一致 , 对于压力测试可以为false,

 

connect_time_scale=》连接时间比例,设置为80%,意为原本要capture 10分钟之后才会登录的session,现在8分钟后就会登录 ,注意设置该值过低会导致大量session登录,可能出现ORA-18/ORA-20错误; 这里设置为80为了加大负载压力

 

think_time_scale=》指2个SQL CALL之间间隔的时间比例,如本来2个SQL之间为100s,设置think_time_scale为25后,其间隔变为25s。

 

 

SCALE_UP_MULTIPLIER=>指定查询的倍数,建议一开始设置为1:1,

 

 

— ***********************************************************

—  PREPARE_REPLAY

—    Puts the DB state in REPLAY mode. The database

—    should have been initialized for replay using

—    DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY(), and optionally any

—    capture time connection strings that require remapping have been

—    already done using DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION().

—    One or more external replay clients (WRC) can be started

—    once the PREPARE_REPLAY procedure has been executed.

—    The procedure will take as input the following parameters:

—      synchronization – Turns synchronization to the given scheme during

—                        workload replay.

—                        When synchronization is SCN, the COMMIT order

—                        observed during the original workload capture

—                        will be preserved during replay.

—                        Every action that is replayed will be executed

—                        ONLY AFTER all of it’s dependent COMMITs (all

—                        COMMITs that were issued before the given action

—                        in the original workload capture) have finished

—                        execution.

—                        When synchronization is OBJECT_ID, a more advanced

—                        synchronization scheme is used.

—                        Every action that is replayed will be executed

—                        ONLY AFTER the RELEVANT COMMITs have finished

—                        executing. The relevant commits are the ones that

—                        were issued before the given action  in the

—                        orginal workload capture and that had modified

—                        at least one of the database objects the given

—                        action is referencing (either implicitely or

—                        explicitely).

—                        This OBJECT_ID scheme has the same logical

—                        property of making sure that any action will see

—                        the same data it saw during capture, but will

—                        allow more concurrency during replays for the

—                        actions that do not touch the same objects/tables.

—                        DEFAULT VALUE: SCN, preserve commit order.

—                        For legacy reason, there is a boolean version of

—                        this procedure:

—                          TRUE  means ‘SCN’

—                          FALSE means ‘OFF’

—      connect_time_scale       – Scales the time elapsed between the

—                                 instant the workload capture was started

—                                 and session connects with the given value.

—                                 The input is interpreted as a % value.

—                                 Can potentially be used to increase or

—                                 decrease the number of concurrent

—                                 users during the workload replay.

—                                 DEFAULT VALUE: 100

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : Capture was started

—                                 12:10 : First session connect  (10m after)

—                                 12:30 : Second session connect (30m after)

—                                 12:42 : Third session connect  (42m after)

—                                 If the connect_time_scale is 50, then the

—                                 session connects will happen as follows:

—                                 12:00 : Replay was started

—                                         with 50% connect time scale

—                                 12:05 : First session connect  ( 5m after)

—                                 12:15 : Second session connect (15m after)

—                                 12:21 : Third session connect  (21m after)

—                                 If the connect_time_scale is 200, then the

—                                 session connects will happen as follows:

—                                 12:00 : Replay was started

—                                         with 200% connect time scale

—                                 12:20 : First session connect  (20m after)

—                                 13:00 : Second session connect (60m after)

—                                 13:24 : Third session connect  (84m after)

—      think_time_scale         – Scales the time elapsed between two

—                                 successive user calls from the same

—                                 session.

—                                 The input is interpreted as a % value.

—                                 Can potentially be used to increase or

—                                 decrease the number of concurrent

—                                 users during the workload replay.

—                                 DEFAULT VALUE: 100

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued

—                                         (10m after completion of prevcall)

—                                 12:14 : First user call completes in 4mins

—                                 12:30 : Second user call issued

—                                         (16m after completion of prevcall)

—                                 12:40 : Second user call completes in 10m

—                                 12:42 : Third user call issued

—                                         ( 2m after completion of prevcall)

—                                 12:50 : Third user call completes in 8m

—                                 If the think_time_scale is 50 during the

—                                 workload replay, then the user calls

—                                 will look something like below:

—                                 12:00 : User SCOTT connects

—                                 12:05 : First user call issued 5 mins

—                                         (50% of 10m) after the completion

—                                         of prev call

—                                 12:10 : First user call completes in 5m

—                                         (takes a minute longer)

—                                 12:18 : Second user call issued 8 mins

—                                         (50% of 16m) after the completion

—                                         of prev call

—                                 12:25 : Second user call completes in 7m

—                                         (takes 3 minutes less)

—                                 12:26 : Third user call issued 1 min

—                                         (50% of 2m) after the completion

—                                         of prev call

—                                 12:35 : Third user call completes in 9m

—                                         (takes a minute longer)

—      think_time_auto_correct  – Auto corrects the think time between calls

—                                 appropriately when user calls takes longer

—                                 time to complete during replay than

—                                 how long the same user call took to

—                                 complete during the original capture.

—                                 DEFAULT VALUE: TRUE, reduce

—                                 think time if replay goes slower

—                                 than capture.

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued

—                                         (10m after completion of prevcall)

—                                 12:14 : First user call completes in 4m

—                                 12:30 : Second user call issued

—                                         (16m after completion of prevcall)

—                                 12:40 : Second user call completes in 10m

—                                 12:42 : Third user call issued

—                                         ( 2m after completion of prevcall)

—                                 12:50 : Third user call completes in 8m

—                                 If the think_time_scale is 100 and

—                                 the think_time_auto_correct is TRUE

—                                 during the workload replay, then

—                                 the user calls will look something

—                                 like below:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued 10 mins

—                                         after the completion of prev call

—                                 12:15 : First user call completes in 5m

—                                         (takes 1 minute longer)

—                                 12:30 : Second user call issued 15 mins

—                                         (16m minus the extra time of 1m

—                                          the prev call took) after the

—                                         completion of prev call

—                                 12:44 : Second user call completes in 14m

—                                         (takes 4 minutes longer)

—                                 12:44 : Third user call issued immediately

—                                         (2m minus the extra time of 4m

—                                          the prev call took) after the

—                                         completion of prev call

—                                 12:52 : Third user call completes in 8m

—      scale_up_multiplier      – Defines the number of times the query workload

—                                 is scaled up during replay. Each captured session

—                                 is replayed concurrently as many times as the

—                                 value of the scale_up_multiplier. However, only

—                                 one of the sessions in each set of identical

—                                 replay sessions executes both queries and updates.

—                                 The remaining sessions only execute queries.

—                                 More specifically note that:

—                                   1. One replay session (base session) of each set

—                                      of identical sessions will replay every call

—                                      from the capture as usual

—                                   2. The remaining sessions (scale-up sessions) will

—                                      only replay calls that are read-only.

—                                      Thus, DDL, DML, and PLSQL calls that

—                                      modified the database will be

—                                      skipped. SELECT FOR UPDATE statements are also skipped.

—                                   3. Read-only calls from the scale-up are

—                                      synchronized appropriately and obey the

—                                      timings defined by think_time_scale, connect_time_scale,

—                                      and think_time_auto_correct. Also the queries

—                                      are made to wait for the appropriate commits.

—                                   4. No replay data or error divergence

—                                      records will be generated for the

—                                      scale-up sessions.

—                                   5. All base or scale-up sessions that

—                                      replay the same capture file will connect

—                                      from the same workload replay client.

—          capture_sts – If this parameter is TRUE, a SQL tuning set

—                        capture is also started in parallel with workload

—                        capture. The resulting SQL tuning set can be

—                        exported using DBMS_WORKLOAD_REPLAY.EXPORT_AWR

—                        along with the AWR data.

—                        Currently, parallel STS capture

—                        is not supported in RAC. So, this parameter has

—                        no effect in RAC.

—                        Furthermore capture filters defined using the

—                        dbms_workload_replay APIs do not apply to the

—                        sql tuning set capture.

—                        The calling user must have the approriate

—                        privileges (‘administer sql tuning set’).

—                        DEFAULT VALUE: FALSE

—     sts_cap_interval – This parameter specifies the capture interval

—                        of the SQL set capture from the cursor cache in

—                        seconds. The default value is 300.

—    Prerequisites:

—      -> The database has been initialized for replay using

—         DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY().

—      -> Any capture time connections strings that require remapping

—         during replay have already been remapped using

—         DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION().

 

 

 

 

 

 

以上完成后启动WRC 客户端:

 

nohup  wrc  orasupport/oracle replaydir=$REPLAY_DIR  DSCN_OFF=TRUE &

 

建议一个INST开75-100个WRC客户端,使用nohup 后台启动

 

 

MODE=REPLAY (default)

———————

 

Keyword     Description

—————————————————————-

USERID      username

PASSWORD    password

SERVER      server connection identifier (Default: empty string)

REPLAYDIR   replay directory (Default:.)

WORKDIR     directory for trace files

DEBUG       ON, OFF (Default: OFF)

CONNECTION_OVERRIDE  TRUE, FALSE (Default: FALSE)

TRUE   All replay threads connect using SERVER,

settings in DBA_WORKLOAD_CONNECTION_MAP will be ignore

FALSE  Use settings from DBA_WORKLOAD_CONNECTION_MAP

SERIALIZE_CONNECTS  TRUE, FALSE (Default: FALSE)

TRUE   All the replay threads will connect to

the database in a serial fashion one after

another. This setting is recommended when

the replay clients use the bequeath protocol

to communicate to the database server.

FALSE  Replay threads will connect to the database

in a concurrent fashion mimicking the original

capture behavior.

DSCN_OFF    TRUE, FALSE (Default: FALSE)

TRUE   Ignore all dependencies due to block

contention during capture when synchronizing

the replay.

FALSE  Honor all captured dependencies.

 

 

MODE=CALIBRATE

————–

Provides an estimate of the number of replay clients needed

 

Keyword     Description

—————————————————————-

REPLAYDIR   replay directory (Default:.)

 

Advanced parameters:

PROCESS_PER_CPU       Maximum number of client process than can be run

per CPU (Default: 4)

THREADS_PER_PROCESS   Maximum number of threads than can be run within

a client process (Default: 50)

 

 

MODE=LIST_HOSTS

—————

Displays all host names involved in the capture

 

Keyword     Description

—————————————————————-

REPLAYDIR   the workload directory (Default:.)

 

 

MODE=GET_TABLES

—————

Lists all objects referenced by captured SQL statements

 

Keyword     Description

—————————————————————-

REPLAYDIR   the workload directory (Default:.)

 

 

 

 

执行下列脚本正式开始REPLAY

 

 

exec DBMS_WORKLOAD_REPLAY.start_replay();

 

 

建议通过EM或者下面的脚本查询进度:

 

Select id, name,status from dba_workload_replays; 

若replay完成可以在EM或者使用如下脚本获得replay报告:

 

 

set pagesize 0 long 30000000 longchunksize 1000

select dbms_workload_replay.report(&ID,’TEXT’) from dual;

 

&ID可以从上面的查询获得

 

 

沪ICP备14014813号-2

沪公网安备 31010802001379号