Goldengate单向大事务复制性能测试

Goldengate单向数据复制,为了图省事没有配pump,只配了source的extract和target的replicat;实际操作发现gg对大事务的支持还是比streams好一些,streams碰上大事务就只看到SPILL MESSAGES了:

/*源端配置信息*/
GGSCI (rh2.oracle.com) 1> view params mgr

PORT 7809


GGSCI (rh2.oracle.com) 2> view params load1


--extract group--
extract load1
--connection to database--
userid ggate, password ggate
--hostname and port for trail--
rmthost rh3.oracle.com, mgrport 7809
--path and name for trail--
rmttrail /s01/gg
ddl include mapped objname sender.*
table sender.*;

GGSCI (rh2.oracle.com) 4> stats extract load1

Sending STATS request to EXTRACT LOAD1 ...

Start of Statistics at 2010-11-29 17:44:41.

DDL replication statistics (for all trails):

*** Total statistics since extract started     ***
        Operations                                   1.00
        Mapped operations                            1.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00

Output to /s01/gg:

Extracting from GGATE.GGS_MARKER to GGATE.GGS_MARKER:

*** Total statistics since 2010-11-29 16:22:25 ***

        No database operations have been performed.

*** Daily statistics since 2010-11-29 16:22:25 ***

        No database operations have been performed.

*** Hourly statistics since 2010-11-29 17:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2010-11-29 16:22:25 ***

        No database operations have been performed.

Extracting from SENDER.ABC to SENDER.ABC:

*** Total statistics since 2010-11-29 16:22:25 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

*** Daily statistics since 2010-11-29 16:22:25 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

*** Hourly statistics since 2010-11-29 17:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2010-11-29 16:22:25 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

End of Statistics.

GGSCI (rh2.oracle.com) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     LOAD1       00:00:00      00:00:08    

/*目标端配置信息*/

GSCI (rh3.oracle.com) 1> view params mgr

PORT 7809
USERID ggate, PASSWORD ggate
PURGEOLDEXTRACTS /s01/gg


GGSCI (rh3.oracle.com) 2> view params rep1

--Replicat group --
replicat rep1
--source and target definitions
ASSUMETARGETDEFS
--target database login --
userid ggate, password ggate
--file for dicarded transaction --
discardfile /s01/discard.txt, append, megabytes 10
--ddl support
DDL
--Specify table mapping ---
map sender.*, target receiver.*;

GGSCI (rh3.oracle.com) 3> stats replicat rep1

Sending STATS request to REPLICAT REP1 ...

Start of Statistics at 2010-11-30 02:44:16.

DDL replication statistics:

*** Total statistics since replicat started     ***
        Operations                                   1.00
        Mapped operations                            1.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00
        Errors                                       0.00
        Retried errors                               0.00
        Discarded errors                             0.00
        Ignored errors                               0.00

Replicating from SENDER.ABC to RECEIVER.ABC:

*** Total statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

*** Daily statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

*** Hourly statistics since 2010-11-30 02:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        2621440.00

End of Statistics.

GGSCI (rh3.oracle.com) 4> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     REP1        00:00:00      00:00:01   

接着尝试在源库端sender模式下产生一个大事务操作,观察目标段数据同步情况*/

SQL> delete sender.abc;
2621440 rows deleted.

SQL> commit;
Commit complete.
/* commit操作完成前,EXTRACT进程并不会抽取日志 */

Goldengate日志ggserror.log的相关记录:
010-11-29 17:49:49  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000029.
2010-11-29 17:49:50  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000030.
2010-11-29 17:49:51  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000031.
2010-11-29 17:49:53  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000032.
2010-11-29 17:49:54  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000033.
2010-11-29 17:49:55  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000034.
2010-11-29 17:49:56  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000035.
2010-11-29 17:49:57  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000036.
2010-11-29 17:49:58  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000037.
2010-11-29 17:49:59  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000038.
2010-11-29 17:50:00  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000039.
2010-11-29 17:50:01  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000040.
2010-11-29 17:50:03  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000041.
2010-11-29 17:50:04  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000042.
2010-11-29 17:50:05  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000043.
2010-11-29 17:50:06  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000044.
2010-11-29 17:50:07  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000045.
2010-11-29 17:50:08  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000046.
2010-11-29 17:50:09  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000047.
2010-11-29 17:50:10  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000048.
2010-11-29 17:50:11  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000049.
2010-11-29 17:50:13  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000050.
2010-11-29 17:50:14  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000051.
2010-11-29 17:50:15  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000052.
2010-11-29 17:50:16  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000053.
2010-11-29 17:50:17  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, load1.prm:  Rolling over remote file /s01/gg000054.

/*产生了26个trail文件*/

/*同时备库端开始陆续应用日志*/
REP1.rpt是rep1 replicat复制进程的文本报告,出现相关内容:
Switching to next trail file /s01/gg000029 at 2010-11-30 02:47:54 due to EOF, with current RBA 9999949
Opened trail file /s01/gg000029 at 2010-11-30 02:47:54

Switching to next trail file /s01/gg000030 at 2010-11-30 02:49:53 due to EOF, with current RBA 9999925
Opened trail file /s01/gg000030 at 2010-11-30 02:49:53

GGSCI (rh3.oracle.com) 6> stats replicat rep1

Sending STATS request to REPLICAT REP1 ...

Start of Statistics at 2010-11-30 02:52:20.

DDL replication statistics:

*** Total statistics since replicat started     ***
        Operations                                   1.00
        Mapped operations                            1.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00
        Errors                                       0.00
        Retried errors                               0.00
        Discarded errors                             0.00
        Ignored errors                               0.00

Replicating from SENDER.ABC to RECEIVER.ABC:

*** Total statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           208150.00
        Total discards                               0.00
        Total operations                        2829590.00

*** Daily statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           208150.00
        Total discards                               0.00
        Total operations                        2829590.00

*** Hourly statistics since 2010-11-30 02:00:00 ***
        Total inserts                                0.00
        Total updates                                0.00
        Total deletes                           208150.00
        Total discards                               0.00
        Total operations                        208150.00

*** Latest statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           208150.00
        Total discards                               0.00
        Total operations                        2829590.00

End of Statistics.


GGSCI (rh3.oracle.com) 7> !
stats replicat rep1

Sending STATS request to REPLICAT REP1 ...

Start of Statistics at 2010-11-30 02:52:26.

DDL replication statistics:

*** Total statistics since replicat started     ***
        Operations                                   1.00
        Mapped operations                            1.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00
        Errors                                       0.00
        Retried errors                               0.00
        Discarded errors                             0.00
        Ignored errors                               0.00

Replicating from SENDER.ABC to RECEIVER.ABC:

*** Total statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           210767.00
        Total discards                               0.00
        Total operations                        2832207.00

*** Daily statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           210767.00
        Total discards                               0.00
        Total operations                        2832207.00

*** Hourly statistics since 2010-11-30 02:00:00 ***
        Total inserts                                0.00
        Total updates                                0.00
        Total deletes                           210767.00
        Total discards                               0.00
        Total operations                        210767.00

*** Latest statistics since 2010-11-30 01:21:09 ***
        Total inserts                           2621440.00
        Total updates                                0.00
        Total deletes                           210767.00
        Total discards                               0.00
        Total operations                        2832207.00

End of Statistics.


/*由以上对replicat进程的统计信息可知其正在应用delete操作*/

Switching to next trail file /s01/gg000031 at 2010-11-30 02:53:26 due to EOF, with current RBA 9999925
Opened trail file /s01/gg000031 at 2010-11-30 02:53:26

Switching to next trail file /s01/gg000032 at 2010-11-30 02:58:28 due to EOF, with current RBA 9999925
Opened trail file /s01/gg000032 at 2010-11-30 02:58:28

[maclean@rh3 s01]$ ls -lh gg00003[1-2]
-rw-rw-rw- 1 maclean oinstall 9.6M Nov 30 02:47 gg000031
-rw-rw-rw- 1 maclean oinstall 9.6M Nov 30 02:47 gg000032

/*由上列trace信息推断,以这样一台双核cpu主频为2.2GHZ的pc机为例,gg应用一个大小为9.6M的trail文件耗时也要将近5分钟(当然这并不准确)。


/*从进程argument可以看到replicate和extract进程的本质是调用了多个配置文件的$GGATE/extract和$GGATE/replicat*/
[maclean@rh3 ~]$ ps -ef|grep repl
maclean   7817  7476  3 01:18 ?        00:03:29 
/home/maclean/gg/replicat PARAMFILE /home/maclean/gg/dirprm/rep1.prm 
REPORTFILE /home/maclean/gg/dirrpt/REP1.rpt 
PROCESSID REP1 USESUBDIRS

[maclean@rh2 dirrpt]$ ps -ef|grep ex
maclean    544 32432  1 16:21 ?        00:01:12 
/home/maclean/gg/extract PARAMFILE /home/maclean/gg/dirprm/load1.prm 
REPORTFILE /home/maclean/gg/dirrpt/LOAD1.rpt 
PROCESSID LOAD1 USESUBDIRS

Poor Data Export Performance(revised)

这是一个十分简单的单表导出作业,表上大约有200万条数据;但实际exp导出时dump文件每秒增长不足1MB,速度十分缓慢。

SQL> show user;
USER is "SH"

SQL> select count(*) from orders;

  COUNT(*)
----------
   2319232

SQL> select bytes/1024/1024 from dba_segments where owner='SH' and segment_name='ORDERS';

BYTES/1024/1024
---------------
            261

/*该表大小为261MB*/

/*这里我们使用直接路径导出*/
[maclean@rh2 ~]$ exp sh/sh file=/s01/orders.dmp direct=y tables=orders
Export: Release 11.2.0.1.0 - Production on Wed Aug 18 21:52:56 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses WE8MSWIN1252 character set (possible charset conversion)
About to export specified tables via Direct Path ...
. . exporting table                         ORDERS

/* 我们通过脚本观察导出文件大小增长速度,每秒大约0.4MB */

[maclean@rh2 s01]$ while (1>0) do ls -lh orders.dmp ; sleep 1; done
-rw-r--r-- 1 maclean oinstall 912K Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 1.4M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 1.8M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 2.2M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 2.6M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 3.0M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 3.5M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 3.9M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 4.3M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 4.7M Aug 18 21:54 orders.dmp
-rw-r--r-- 1 maclean oinstall 5.1M Aug 18 21:54 orders.dmp

/*是什么导致了exp如此缓慢呢,有必看一下该导出服务进程的trace文件*/

SQL> select spid from v$process where addr=(select paddr from v$session where module like 'exp@%');
SPID
------------------------
22624

SQL> oradebug setospid 22624;
Oracle pid: 37, Unix process pid: 22624, image: oracle@rh2 (TNS V1-V3)
SQL> oradebug tracefile_name;
/rdbms/sprod/SPROD/trace/SPROD_ora_22624.trc
SQL> host
[maclean@rh2 trace]$ ls -lh SPROD_ora_22624.trc
-rw-r----- 1 maclean oinstall 1.2G Aug 18 21:58 SPROD_ora_22624.trc

/* 这个trace文件居然有1.2G之巨,会存放了些什么呢?*/

[maclean@rh2 trace]$ tail -100 SPROD_ora_22624.trc
2010-08-18 21:59:14.748194 : nsbasic_bsd:41 20 43 4C 41 53 53 07  |A.CLASS.|
2010-08-18 21:59:14.748201 : nsbasic_bsd:00 78 6D 08 0F 01 1A 2E  |.xm.....|
2010-08-18 21:59:14.748212 : nsbasic_bsd:07 00 78 6D 08 0F 01 1A  |..xm....|
2010-08-18 21:59:14.748219 : nsbasic_bsd:2E 13 00 32 30 30 39 2D  |...2009-|
2010-08-18 21:59:14.748225 : nsbasic_bsd:30 38 2D 31 35 3A 30 30  |08-15:00|
2010-08-18 21:59:14.748232 : nsbasic_bsd:3A 32 35 3A 34 35 05 00  |:25:45..|
2010-08-18 21:59:14.748239 : nsbasic_bsd:43 48 45 43 4B 01 00 4E  |CHECK..N|
2010-08-18 21:59:14.748246 : nsbasic_bsd:01 00 4E 01 00 4E 02 00  |..N..N..|
2010-08-18 21:59:14.748253 : nsbasic_bsd:C1 02 FE FF 06 00 50 55  |......PU|
2010-08-18 21:59:14.748260 : nsbasic_bsd:42 4C 49 43 15 00 2F 39  |BLIC../9|

/*绝大多数是Oracle sqlnet trace的内容,是不是因为设置了Oracle SqlNet端的trace选项,从而导致了导出服务进程写出大量sqlnet trace,最终引发exp的缓慢?*/

[maclean@rh2 trace]$ cat $ORACLE_HOME/network/admin/sqlnet.ora
TRACE_LEVEL_SERVER = 16
TRACE_FILE_SERVER = SERVER
TRACE_TIMESTAMP_ SERVER = ON
TRACE_UNIQUE_SERVER = ON

/*的确启用了server端sqlnet trace*/

[maclean@rh2 trace]$ echo "" >  /s01/11gdb/network/admin/sqlnet.ora

/*清空sqlnet.ora配置文件,禁用server端sqlnet trace*/

/*再次尝试exp导出*/

[maclean@rh2 ~]$ exp sh/sh file=/s01/orders.dmp direct=y tables=orders
Export: Release 11.2.0.1.0 - Production on Wed Aug 18 22:00:35 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses WE8MSWIN1252 character set (possible charset conversion)

About to export specified tables via Direct Path ...
. . exporting table                         ORDERS

[maclean@rh2 s01]$ while (1>0) do ls -lh orders.dmp ; sleep 1; done
-rw-r--r-- 1 maclean oinstall 61M Aug 18 22:00 orders.dmp
-rw-r--r-- 1 maclean oinstall 94M Aug 18 22:00 orders.dmp
-rw-r--r-- 1 maclean oinstall 108M Aug 18 22:00 orders.dmp
-rw-r--r-- 1 maclean oinstall 140M Aug 18 22:00 orders.dmp
-rw-r--r-- 1 maclean oinstall 162M Aug 18 22:00 orders.dmp
-rw-r--r-- 1 maclean oinstall 181M Aug 18 22:00 orders.dmp

/*dump文件每秒增长20MB左右,exp导出速度恢复正常*/

/*验证了之前设置server端16级的sqlnet trace会引起exp性能下降的猜测*/

/*如果是expdp数据泵导出工具是否会受到该sqlnet trace的影响呢*/

[maclean@rh2 s01]$ echo "TRACE_LEVEL_SERVER = 16
> TRACE_FILE_SERVER = SERVER
> TRACE_TIMESTAMP_ SERVER = ON
> TRACE_UNIQUE_SERVER = ON" > $ORACLE_HOME/network/admin/sqlnet.ora

[maclean@rh2 dump]$ expdp sh/sh directory=ordump dumpfile=orders.dmp tables=orders
Export: Release 11.2.0.1.0 - Production on Wed Aug 18 22:05:17 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "SH"."SYS_EXPORT_TABLE_01":  sh/******** directory=ordump dumpfile=orders.dmp tables=orders
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 261 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "SH"."ORDERS"                               223.8 MB 2319232 rows
Master table "SH"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SH.SYS_EXPORT_TABLE_01 is:
  /s01/dump/orders.dmp
Job "SH"."SYS_EXPORT_TABLE_01" successfully completed at 22:05:29

/* expdp数据泵在12s内完成了导出工作,可见其并不受到sqlnet trace的不良影响*/

/* 如果我们降低Sqlnet trace的级别,是否可以降低其对exp导出性能的影响呢?*/

/*尝试将trace level改成1*/

[maclean@rh2 s01]$ echo "TRACE_LEVEL_SERVER = 1
> TRACE_FILE_SERVER = SERVER
> TRACE_TIMESTAMP_ SERVER = ON
> TRACE_UNIQUE_SERVER = ON"   >  $ORACLE_HOME/network/admin/sqlnet.ora

maclean@rh2 ~]$ exp sh/sh file=/s01/orders.dmp direct=y tables=orders
Export: Release 11.2.0.1.0 - Production on Wed Aug 18 22:11:57 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses WE8MSWIN1252 character set (possible charset conversion)

About to export specified tables via Direct Path ...
. . exporting table                         ORDERS    2319232 rows exported
EXP-00091: Exporting questionable statistics.
Export terminated successfully with warnings.

[maclean@rh2 s01]$ while (1>0) do ls -lh orders.dmp ; sleep 1; done
-rw-r--r-- 1 maclean oinstall 0 Aug 18 22:11 orders.dmp
-rw-r--r-- 1 maclean oinstall 33M Aug 18 22:11 orders.dmp
-rw-r--r-- 1 maclean oinstall 74M Aug 18 22:11 orders.dmp
-rw-r--r-- 1 maclean oinstall 108M Aug 18 22:12 orders.dmp
-rw-r--r-- 1 maclean oinstall 146M Aug 18 22:12 orders.dmp
-rw-r--r-- 1 maclean oinstall 183M Aug 18 22:12 orders.dmp

/* 导出速度没有明显下降,较低级别的server端sqlnet trace不会对exp导出性能造成影响*/

这个例子告诉我们在使用高级别的sqlnet trace诊断Oracle网络问题后,一定要记得要还原现场的配置,否则可能造成”莫名的性能问题”。

Script:收集RAC性能诊断信息

以下脚本可以用于收集RAC性能诊断信息:


spool rac_diag.log

SELECT B1.INST_ID,
       B2.VALUE blocks,
       Round(B1.VALUE / 100) total_time,
       round((B1.VALUE / B2.VALUE) * 10, 2) avg_time
  FROM GV$SYSSTAT B1, GV$SYSSTAT B2
 WHERE B1.NAME = 'gc cr block receive time'
   AND B2.NAME = 'gc cr blocks received'
   AND B1.INST_ID = B2.INST_ID
   AND B2.VALUE > 0
 Order by 1
/

select v.banner, i.instance_name
  from gv$version v, gv$instance i
 where v.inst_id = i.inst_id
   and v.banner in
       (select banner
          from (select banner, count(*) cnt from gv$version group by banner)
         where cnt <> 2)
 order by 1, 2
/

SELECT INST_ID,
       SND_Q_LEN,
       SND_Q_MAX,
       SND_Q_TOT,
       TCKT_AVAIL,
       TCKT_LIMIT,
       TCKT_RCVD,
       TCKT_WAIT
  FROM GV$DLM_TRAFFIC_CONTROLLER
 WHERE (SND_Q_LEN > 0)
    OR ((TCKT_LIMIT - TCKT_AVAIL) >= (TCKT_LIMIT * 0.6))
    OR TCKT_WAIT = 'YES'
/


SELECT A1.INST_ID,
       A1.VALUE blocks_lost,
       A2.VALUE blocks_corrupt
FROM   GV$SYSSTAT A1,
       GV$SYSSTAT A2
WHERE  A1.NAME = 'gc blocks lost'
AND    A2.NAME = 'gc blocks corrupt'
AND    A1.INST_ID = A2.INST_ID
AND    (a1.value > 0 or a2.value > 0)
/


select *
  from (SELECT INST_ID,
               OWNER#,
               NAME,
               KIND,
               FILE#,
               SUM(FORCED_READS) READS,
               SUM(FORCED_WRITES) WRITES,
               MAX(XNC) XNC
          FROM GV$CACHE_TRANSFER
         GROUP BY INST_ID, OWNER#, NAME, KIND, FILE#
         ORDER BY 8 DESC)
 where rownum <= 10
/


select o.parameter, o.value, i.instance_name
  from gv$option o, gv$instance i
 where o.inst_id = i.inst_id
   and o.Parameter in (select Parameter
                         from (select Parameter, value, count(*) cnt
                                 from gv$option
                                group by Parameter, value)
                        where cnt <> 2)
 order by 1, 2
/

select p.name, p.value, i.instance_name
  from gv$parameter p, gv$instance i
 where p.inst_id = i.inst_id
   and p.name in (select name
                    from (select name, value, count(*) cnt
                            from gv$parameter
                           where name in ('archive_lag_target',
                                          'control_management_pack_access',
                                          'diagnostic_dest',
                                          'redo_transport_user',
                                          'trace_enabled',
                                          'license_max_users',
                                          'log_archive_format',
                                          'spfile',
                                          'undo_retention')
                           group by name, value)
                   where cnt <> 2)
 order by 1, 2
 /
 
 
select p.name, p.value, i.instance_name
  from gv$parameter p, gv$instance i
 where p.inst_id = i.inst_id
   and p.name in (select name
                    from (select name, value, count(*) cnt
                            from gv$parameter
                           where name in ('active_instance_count',
                                          'cluster_database',
                                          'cluster_database_instances',
                                          'compatible',
                                          'control_files',
                                          'db_block_size',
                                          'db_domain',
                                          'db_files',
                                          'db_name',
                                          'db_recovery_file_dest',
                                          'db_recovery_file_dest_size',
                                          'db_unique_name',
                                          'instance_type',
                                          'max_parallel_servers',
                                          'parallel_execution_message_size',
                                          'dml_locks',
                                          'remote_login_passwordfile',
                                          'result_cache_max_size',
                                          'undo_management')
                             and not ((name = 'dml_locks') and (value = '0'))
                           group by name, value)
                   where cnt <> 2)
 order by 1, 2
/
 
 
select p.name, p.value, i.instance_name
  from gv$parameter p, gv$instance i
 where p.inst_id = i.inst_id
   and p.name in
       (select name
          from (select name, value, count(*) cnt
                  from gv$parameter
                 where name in ('instance_name',
                                'instance_number',
                                'thread',
                                'rollback_segments',
                                'undo_tablespace')
                   and not ((name = 'rollback_segments') and (value = null))
                   and not ((name = 'instance_name') and (value = null))
                 group by name, value)
         where cnt <> 1)
 order by 1, 2
/

select s.inst_id,
       s.blocks_served,
       Round(1000000 * s.pin_time / s.blocks_served) / 1000 avg_pin_time,
       Round(1000000 * s.flush_time / s.blocks_served) / 1000 avg_flush_time,
       Round(1000000 * s.send_time / s.blocks_served) / 1000 avg_send_time,
       Round((1000000 * (s.pin_time + s.flush_time + s.send_time)) /
             s.blocks_served) / 1000 avg_service_time
  from (select inst_id,
               sum(decode(name, 'gc current block pin time', value, 0)) pin_time,
               sum(decode(name, 'gc current block pin flush', value, 0)) flush_time,
               sum(decode(name, 'gc current block pin send', value, 0)) send_time,
               sum(decode(name, 'gc current block blocks served', value, 0)) blocks_served
          from gv$sysstat
         where name in ('gc current block pin time',
                        'gc current block pin flush',
                        'gc current block pin send',
                        'gc current block blocks served')
         group by inst_id) s
 where s.blocks_served > 0
/

spool off

Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)

Script

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hh24mi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool lfsdiag_&&dbname&&timestamp&&suffix
set trim on
set trims on
set lines 130
set pages 100
set verify off
alter session set optimizer_features_enable = '10.2.0.4';

PROMPT LFSDIAG DATA FOR &&dbname&&timestamp
PROMPT Note: All timings are in milliseconds (1000 milliseconds = 1 second)

PROMPT
PROMPT IMPORTANT PARAMETERS RELATING TO LOG FILE SYNC WAITS:
column name format a40 wra
column value format a40 wra
select inst_id, name, value from gv$parameter
where ((value is not null and name like '%log_archive%') or
name like '%commit%' or name like '%event=%' or name like '%lgwr%')
and name not in (select name from gv$parameter where (name like '%log_archive_dest_state%'
and value = 'enable') or name = 'log_archive_format')
order by 1,2,3;

PROMPT
PROMPT HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
PROMPT
PROMPT APPROACH: Look at the wait distribution for log file sync waits
PROMPT by looking at "wait_time_milli". Look at the high wait times then
PROMPT see if you can correlate those with other related wait events.
column event format a40 wra
select inst_id, event, wait_time_milli, wait_count
from gv$event_histogram
where event in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way') or
event like '%LGWR%' or event like '%LNS%'
order by 2 desc,1,3;

PROMPT
PROMPT ORDERED BY WAIT_TIME_MILLI
select inst_id, event, wait_time_milli, wait_count
from gv$event_histogram
where event in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or event like '%LGWR%' or event like '%LNS%'
order by 3,1,2 desc;

PROMPT
PROMPT REDO WRITE STATS
PROMPT
PROMPT "redo write time" in centiseconds (100 per second)
PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
column value format 99999999999999999999
column milliseconds format 99999999999999.999
select v.version, ss.inst_id, ss.name, ss.value,
decode(substr(version,1,4),
'11.1',decode (name,'redo write time',value*10,
'redo write broadcast ack time',value*10),
'11.2',decode (name,'redo write time',value*10,
'redo write broadcast ack time',value/1000),
decode (name,'redo write time',value*10)) milliseconds
from gv$sysstat ss, v$instance v
where name like 'redo write%' and value > 0
order by 1,2,3;

PROMPT
PROMPT ASH THRESHOLD...
PROMPT
PROMPT This will be the threshold in milliseconds for average log file sync
PROMPT times. This will be used for the next queries to look for the worst
PROMPT 'log file sync' minutes. Any minutes that have an average log file
PROMPT sync time greater than the threshold will be analyzed further.
column threshold_in_ms new_value threshold format 999999999.999
select min(threshold_in_ms) threshold_in_ms
from (select inst_id, to_char(sample_time,'Mondd_hh24mi') minute,
avg(time_waited)/1000 threshold_in_ms
from gv$active_session_history
where event = 'log file sync'
group by inst_id,to_char(sample_time,'Mondd_hh24mi')
order by 3 desc)
where rownum <= 5;

PROMPT
PROMPT ASH WORST MINUTES FOR LOG FILE SYNC WAITS:
PROMPT
PROMPT APPROACH: These are the minutes where the avg log file sync time
PROMPT was the highest (in milliseconds).
column event format a30 tru
column program format a35 tru
column total_wait_time format 999999999999.999
column avg_time_waited format 999999999999.999
select to_char(sample_time,'Mondd_hh24mi') minute, inst_id, event,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id, event
having avg(time_waited)/1000 > &&threshold
order by 1,2;

PROMPT
PROMPT ASH LFS BACKGROUND PROCESS WAITS DURING WORST MINUTES:
PROMPT
PROMPT APPROACH: What is LGWR doing when 'log file sync' waits
PROMPT are happening? LMS info may be relevent for broadcast
PROMPT on commit and LNS data may be relevant for dataguard.
PROMPT If more details are needed see the ASH DETAILS FOR WORST
PROMPT MINUTES section at the bottom of the report.
column inst format 999
column event format a30 tru
column program format a35 wra
select to_char(sample_time,'Mondd_hh24mi') minute, inst_id inst, program, event,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED
from gv$active_session_history
where to_char(sample_time,'Mondd_hh24mi') in (select to_char(sample_time,'Mondd_hh24mi')
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id
having avg(time_waited)/1000 > &&threshold)
and (program like '%LGWR%' or program like '%LMS%' or
program like '%LNS%' or event = 'log file sync')
group by to_char(sample_time,'Mondd_hh24mi'), inst_id, program, event
order by 1,2,3,5 desc, 4;

PROMPT
PROMPT AWR WORST AVG LOG FILE SYNC SNAPS:
PROMPT
PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
PROMPT times were the highest.
column begin format a12 tru
column end format a12 tru
column name format a13 tru
select dhs.snap_id, dhs.instance_number inst, to_char(dhs.begin_interval_time,'Mondd_hh24mi') BEGIN,
to_char(dhs.end_interval_time,'Mondd_hh24mi') END,
en.name, se.time_waited_micro/1000 total_wait_time, se.total_waits,
se.time_waited_micro/1000 / se.total_waits avg_time_waited
from dba_hist_snapshot dhs, wrh$_system_event se, v$event_name en
where (dhs.snap_id = se.snap_id and dhs.instance_number = se.instance_number)
and se.event_id = en.event_id and en.name = 'log file sync' and
dhs.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,2;

PROMPT
PROMPT AWR REDO WRITE STATS
PROMPT
PROMPT "redo write time" in centiseconds (100 per second)
PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
column stat_name format a30 tru
select v.version, ss.snap_id, ss.instance_number inst, sn.stat_name, ss.value,
decode(substr(version,1,4),
'11.1',decode (stat_name,'redo write time',value*10,
'redo write broadcast ack time',value*10),
'11.2',decode (stat_name,'redo write time',value*10,
'redo write broadcast ack time',value/1000),
decode (stat_name,'redo write time',value*10)) milliseconds
from wrh$_sysstat ss, wrh$_stat_name sn, v$instance v
where ss.stat_id = sn.stat_id
and sn.stat_name like 'redo write%' and ss.value > 0
and ss.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,2,3;

PROMPT
PROMPT AWR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
PROMPT
PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
PROMPT times were the highest. Look at related waits at those times.
column name format a40 tru
select se.snap_id, se.instance_number inst, en.name,
se.total_waits, se.time_waited_micro/1000 total_wait_time,
se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and (en.name in
('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and se.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1, 6 desc;

PROMPT
PROMPT AWR HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
PROMPT Note: This query won't work on 10.2 - ORA-942
PROMPT
PROMPT APPROACH: Look at the wait distribution for log file sync waits
PROMPT by looking at "wait_time_milli". Look at the high wait times then
PROMPT see if you can correlate those with other related wait events.
select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
from wrh$_event_histogram eh, v$event_name en
where eh.event_id = en.event_id and
(en.name in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,3 desc,2,4;

PROMPT
PROMPT ORDERED BY WAIT_TIME_MILLI
PROMPT Note: This query won't work on 10.2 - ORA-942
select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
from wrh$_event_histogram eh, v$event_name en
where eh.event_id = en.event_id and
(en.name in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,4,2,3 desc;

PROMPT
PROMPT ASH DETAILS FOR WORST MINUTES:
PROMPT
PROMPT APPROACH: If you cannot determine the problem from the data
PROMPT above, you may need to look at the details of what each session
PROMPT is doing during each 'bad' snap. Most likely you will want to
PROMPT note the times of the high log file sync waits, look at what
PROMPT LGWR is doing at those times, and go from there...
column program format a45 wra
column sample_time format a25 tru
column event format a30 tru
column time_waited format 999999.999
column p1 format a40 tru
column p2 format a40 tru
column p3 format a40 tru
select sample_time, inst_id inst, session_id, program, event, time_waited/1000 TIME_WAITED,
p1text||': '||p1 p1,p2text||': '||p2 p2,p3text||': '||p3 p3
from gv$active_session_history
where to_char(sample_time,'Mondd_hh24mi') in (select
to_char(sample_time,'Mondd_hh24mi')
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id
having avg(time_waited)/1000 > &&threshold)
order by 1,2,3,4,5;

select to_char(sysdate,'Mondd hh24:mi:ss') TIME from dual;

spool off

PROMPT
PROMPT OUTPUT FILE IS: lfsdiag_&&dbname&&timestamp&&suffix
PROMPT

沪ICP备14014813号-2

沪公网安备 31010802001379号