7月最新发布11.2.0.1.2 Patch set update

7月13日,11g release 2 的第二个补丁集更新发布了;9i的最终版本为9.2.0.8,10g上10.2.0.5很有可能成为最终版本,我们预期今后(11g,12g)中Patch set数量会有效减少,而patch set update数量可能大幅增加;这样的更新形式可以为Oracle Database提升一定的软件形象。可以猜想11gr2的最终版本号可能是11.2.0.2/3.x。

附该psu的readme note:

Released: July 13, 2010

This document is accurate at the time of release. For any changes and additional information regarding PSU 11.2.0.1.2, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

  • Note 854428.1 Patch Set Updates for Oracle Products
  • Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues

This document includes the following sections:

1 Patch Information

Patch Set Update (PSU) patches are cumulative. That is, the content of all previous PSUs is included in the latest PSU patch.

PSU 11.2.0.1.2 includes the fixes listed in Section 5, “Bugs Fixed by This Patch”.

Table 1 describes installation types and security content. For each installation type, it indicates the most recent PSU patch to include new security fixes that are pertinent to that installation type. If there are no security fixes to be applied to an installation type, then “None” is indicated. If a specific PSU is listed, then apply that or any later PSU patch to be current with security fixes.

Table 1 Installation Types and Security Content

Installation Type Latest PSU with Security Fixes
Server homes PSU 11.2.0.1.2


Client-Only Installations None
Instant Client Installations None

(The Instant Client installation is not the same as the client-only Installation. For additional information about Instant Client installations, see Oracle Database Concepts.)

2 Patch Installation and Deinstallation

This section includes the following sections:

2.1 Platforms for PSU 11.2.0.1.2

For a list of platforms that are supported in this Patch Set Update, see My Oracle Support Note 1060989.1 Critical Patch Update July 2010 Patch Availability Document for Oracle Products.

2.2 OPatch Utility Information

You must use the OPatch utility version 11.2.0.1.0 or later to apply this patch. Oracle recommends that you use the latest released OPatch 11.2, which is available for download from My Oracle Support patch 6880880 by selecting the 11.2.0.0.0 release.

For information about OPatch documentation, including any known issues, see My Oracle Support Note 293369.1 OPatch documentation list.

2.3 Patch Installation

These instructions are for all Oracle Database installations.

2.3.1 Patch Pre-Installation Instructions

Before you install PSU 11.2.0.1.2, perform the following actions to check the environment and to detect and resolve any one-off patch conflicts.

2.3.1.1 Environments with ASM

If you are installing the PSU to an environment that has Automatic Storage Management (ASM), note the following:

  • For Linux x86 and Linux x86-64 platforms, install either (A) the bug fix for 8898852 and the Database PSU patch 9654983, or (B) the Grid Infrastructure PSU patch 9343627.
  • For all other platforms, no action is required. The fix for 8898852 was included in the base 11.2.0.1.0 release.

2.3.1.2 Environment Checks
  1. Ensure that the $PATH definition has the following executables: make, ar, ld, and nm.The location of these executables depends on your operating system. On many operating systems, they are located in /usr/ccs/bin, in which case you can set your PATH definition as follows:
    export PATH=$PATH:/usr/ccs/bin
    

2.3.1.3 One-off Patch Conflict Detection and Resolution

For an introduction to the PSU one-off patch concepts, see “Patch Set Updates Patch Conflict Resolution” in My Oracle Support Note 854428.1 Patch Set Updates for Oracle Products.

The fastest and easiest way to determine whether you have one-off patches in the Oracle home that conflict with the PSU, and to get the necessary conflict resolution patches, is to use the Patch Recommendations and Patch Plans features on the Patches & Updates tab in My Oracle Support. These features work in conjunction with the My Oracle Support Configuration Manager. Recorded training sessions on these features can be found in Note 603505.1.

However, if you are not using My Oracle Support Patch Plans, follow these steps:

  1. Determine whether any currently installed one-off patches conflict with the PSU patch as follows:
    unzip p9654983_11201_<platform>.zip
    opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./9654983
    
  2. The report will indicate the patches that conflict with PSU 9654983 and the patches for which PSU 9654983 is a superset.Note that Oracle proactively provides PSU 11.2.0.1.2 one-off patches for common conflicts.
  3. Use My Oracle Support Note 1061295.1 Patch Set Updates – One-off Patch Conflict Resolution to determine, for each conflicting patch, whether a conflict resolution patch is already available, and if you need to request a new conflict resolution patch or if the conflict may be ignored.
  4. When all the one-off patches that you have requested are available at My Oracle Support, proceed with Section 2.3.2, “Patch Installation Instructions”.

2.3.2 Patch Installation Instructions

Follow these steps:

  1. If you are using a Data Guard Physical Standby database, you must first install this patch on the primary database before installing the patch on the physical standby database. It is not supported to install this patch on the physical standby database before installing the patch on the primary database. For more information, see My Oracle Support Note 278641.1.
  2. Do one of the following, depending on whether this is a RAC environment:
    • If this is a RAC environment, choose one of the patch installation methods provided by OPatch (rolling, all node, or minimum downtime), and shutdown instances and listeners as appropriate for the installation method selected.This PSU patch is rolling RAC installable. Refer to My Oracle Support Note 244241.1 Rolling Patch – OPatch Support for RAC.
    • If this is not a RAC environment, shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Set your current directory to the directory where the patch is located and then run the OPatch utility by entering the following commands:
    unzip p9654983_11201_<platform>.zip
    cd 9654983
    opatch apply
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.3.3 Patch Post-Installation Instructions

After installing the patch, perform the following actions:

  1. Apply conflict resolution patches as explained in Section 2.3.3.1.
  2. Load modified SQL files into the database, as explained in Section 2.3.3.2.

2.3.3.1 Applying Conflict Resolution Patches

Apply the patch conflict resolution one-off patches that were determined to be needed when you performed the steps in Section 2.3.1.3, “One-off Patch Conflict Detection and Resolution”.

2.3.3.2 Loading Modified SQL Files into the Database

The following steps load modified SQL files into the database. For a RAC environment, perform these steps on only one node.

  1. For each database instance running on the Oracle home being patched, connect to the database using SQL*Plus. Connect as SYSDBA and run the catbundle.sql script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle.sql psu apply
    SQL> QUIT
    

    The catbundle.sql execution is reflected in the dba_registry_history view by a row associated with bundle series PSU.

    For information about the catbundle.sql script, see My Oracle Support Note 605795.1 Introduction to Oracle Database catbundle.sql.

  2. Check the following log files in $ORACLE_HOME/cfgtoollogs/catbundle for any errors:
    catbundle_PSU_<database SID>_APPLY_<TIMESTAMP>.log
    catbundle_PSU_<database SID>_GENERATE_<TIMESTAMP>.log
    

    where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.3.4 Patch Post-Installation Instructions for Databases Created or Upgraded after Installation of PSU 11.2.0.1.2 in the Oracle Home

These instructions are for a database that is created or upgraded after the installation of PSU 11.2.0.1.2.

You must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” for any new database only if it was created by any of the following methods:

  • Using DBCA (Database Configuration Assistant) to select a sample database (General, Data Warehouse, Transaction Processing)
  • Using a script that was created by DBCA that creates a database from a sample database

2.4 Patch Deinstallation

These instructions apply if you need to deinstall the patch.

2.4.1 Patch Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Verify that an $ORACLE_HOME/rdbms/admin/catbundle_PSU_<database SID>_ROLLBACK.sql file exists for each database associated with this ORACLE_HOME. If this is not the case, you must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” against the database before deinstalling the PSU.
  2. Shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.4.2 Patch Post-Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Start all database instances running from the Oracle home. (For more information, see Oracle Database Administrator’s Guide.)
  2. For each database instance running out of the ORACLE_HOME, connect to the database using SQL*Plus as SYSDBA and run the rollback script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle_PSU_<database SID>_ROLLBACK.sql
    SQL> QUIT
    

    In a RAC environment, the name of the rollback script will have the format catbundle_PSU_<database SID PREFIX>_ROLLBACK.sql.

  3. Check the log file for any errors. The log file is found in $ORACLE_HOME/cfgtoollogs/catbundle and is named catbundle_PSU_<database SID>_ROLLBACK_<TIMESTAMP>.log where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.4.3 Patch Deinstallation Instructions for a RAC Environment

Follow these steps for each node in the cluster, one node at a time:

  1. Shut down the instance on the node.
  2. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    

    If there are errors, refer to Section 3, “Known Issues”.

  3. Start the instance on the node as follows:
    srvctl start instance
    

2.4.4 Patch Post-Deinstallation Instructions for a RAC Environment

Follow the instructions listed in Section Section 2.4.2, “Patch Post-Deinstallation Instructions for a Non-RAC Environment” only on the node for which the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” were executed during the patch application.

All other instances can be started and accessed as usual while you are executing the deinstallation steps.

3 Known Issues

For information about OPatch issues, see My Oracle Support Note 293369.1 OPatch documentation list.

For issues documented after the release of this PSU, see My Oracle Support Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues.

Other known issues are as follows.

Issue 1
The following ignorable errors may be encountered while running the catbundle.sql script or its rollback script:

ORA-29809: cannot drop an operator with dependent objects
ORA-29931: specified association does not exist
ORA-29830: operator does not exist
ORA-00942: table or view does not exist
ORA-00955: name is already used by an existing object
ORA-01430: column being added already exists in table
ORA-01432: public synonym to be dropped does not exist
ORA-01434: private synonym to be dropped does not exist
ORA-01435: user does not exist
ORA-01917: user or role 'XDB' does not exist
ORA-01920: user name '<user-name>' conflicts with another user or role name
ORA-01921: role name '<role name>' conflicts with another user or role name
ORA-01952: system privileges not granted to 'WKSYS'
ORA-02303: cannot drop or replace a type with type or table dependents
ORA-02443: Cannot drop constraint - nonexistent constraint
ORA-04043: object <object-name> does not exist
ORA-29832: cannot drop or replace an indextype with dependent indexes
ORA-29844: duplicate operator name specified
ORA-14452: attempt to create, alter or drop an index on temporary table already in use
ORA-06512: at line <line number>. If this error follow any of above errors, then can be safely ignored.
ORA-01927: cannot REVOKE privileges you did not grant

4 References

The following documents are references for this patch.

Note 293369.1 OPatch documentation list

Note 360870.1 Impact of Java Security Vulnerabilities on Oracle Products

Note 468959.1 Enterprise Manager Grid Control Known Issues

Note 9352237.8 Bug 9352237 – 11.2.0.1.1 Patch Set Update (PSU)

5 Bugs Fixed by This Patch

This patch includes the following bug fixes.

5.1 CPU Molecules

CPU molecules in PSU 11.2.0.1.2:

PSU 11.2.0.1.2 contains the following new CPU molecules:

9676419 – DB-11.2.0.1-MOLECULE-004-CPUJUL2010

9676420 – DB-11.2.0.1-MOLECULE-005-CPUJUL2010

5.2 Bug Fixes

PSU 11.2.0.1.2 contains the following new fixes:

Automatic Storage Management

8755082 – ORA-00600: [KCFIS_TRANSLATE4:VOLUME LOOKUP], [2], [WRONG DEVICE NAME], [], [], [

8890026 – ASM PARTNERING CREATES IMBALANCED PARTNERSHIPS

9170608 – STBH:DD BLOCKS PINNED FOR QUERIES THAT DO NOT REQUEST USED SPACE

9363145 – STBH:DB INSTANCES TERMINATED BY ASMB DUE TO ORA-00600 [KFDSKALLOC0]

Buffer Cache

8330783 – HANGING DB WITH “CACHE BUFFER CHAINS” AND “BUFFER DEADLOCK” WAITS DURING INSERT

8822531 – TAKING AWR SNAP HANGS

Data Guard Broker

8918433 – UNPERSISTED FSFO STATE BITS CAN GET PERSISTED

9363384 – PHYSICAL STANDBY SERVICES NOT STARTED AFTER CONVERT FROM SNAPSHOT

9467635 – BROKER’S METADATA FILE UPGRADE TO 11.2 IS BROKEN

9467727 – GETSTATUS DOC YIELDS INCORRECT RESULT IF DBRESOURCE_ID PROP VALUE IS USED

Data Guard Logical

8774868 – LGSBFSFO: ORA-600 [3020], [3], [138] RAISED IN RECOVERY SLAVE

8822832 – V$ARCHIVE_DEST_STATUS HAS INCORRECT VALUE FOR APPLIED_SEQ#

DataGuard Redo Transport

8872096 – ARCHIVING FORCED DURING CLOSE WHEN NO STANDBY IS PRESENT

9399090 – STBH: CONSTANT/HIGH FREQUENT LOG SWITCHES ON BEEHIVE DATABASE IN THE LAST 3 DAYS

Shared Cursors

8865718 – RECURSIVE CURSORS CONTAINING “AS OF SNAPSHOT” CLAUSE ARE NOT SHARED

8981059 – HIGH VERSION COUNT:BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,OPTIMIZER_MODE_MISMATCH

9010222 – APPS ST 11G ORA-00600 [KKSFBC-REPARSE-INFINITE-LOOP]

9067282 – TB:SH:ORA-00600:[KKSFBC-WRONG-KKSCSFLGS] WHILE RUNNING TPC-H

DML Drivers

9255542 – ARRAY INSERT TO PARTITIONED TABLE LOOSES ROWS DUE TO CONCURRENT DDL (ORA-14403)

9488887 – FORIEGN KEY VIOLATION WITH ARRAY-INSERT AND ONLINE IDX REBUILD AFTER BUG-9255542

Flashback Database

8834425 – ORA-240 IN RVWR PROCESS CAUSING 5MIN TRANSACTIONAL HANG

PLSQL

9210925 – AFTER MANUAL UPGRADE TO 11.1.0.7 PL/SQL CALLS INCORRECT FUNCTION

Automatic Memory Management

8505803 – PRE_PAGE_SGA RESULTS IN EXCESSIVE PAGE TABLE SIZE WHEN USING MEMORY_TARGET [AMM]

Partitioning

9165206 – PARTITIONING ORA-600 [KKPOLLS1] / [KKDOILSF1] – DURING PARTITION MAINTANANCE

Real Application Cluster

8875671 – LX64: ORA-600 ARGS [KJPNP_CHK:!MASTER_READY],

9093300 – LOTS OF REPEATED KJXOCDR: DROP DUPLICATE OPEN MESSAGE IN LMD TRACE

Row Access Method

8544696 – TABLE GROWTH – BLOCKS ARE NOT REUSED

Streams

8650719 – DOWNSTREAM CAPTURE ABORTS WITH ORA-26766

Secure Files

8856478 – RAM SECUREFILE PERF DEGRADATION WITH SF COMPRESSION ON SMALL LOBS DURING ATB MOVE

9272086 – STBH: DATA PUMP WRITER SEEMS TO BE WAITING ON WAIT FOR UNREAD MESSAGE ON BROADCA

DB Recovery

8909984 – APPSST GSI 11G: GAPS IN AWR SNAPSHOTS

9068088 – MEDIA RECOVERY WAS HUNG ON STANDBY

9145541 – ORA-600 [25027] / ORA-600 [4097] FOR ACTIVE TX IN A PLUGGED TABLESPACE

9167285 – PKT-BUGOLTP: ORA-07445: [KCRALC()+87]

Space Management

7519406 – ‘J000’ TRACE FILE REGARDING GATHER_STATS_JOB INTERMITTENTLY SINCE 10.2.0.4

8815639 – [11GR2-LNX-090813] MULTIPLE INSERT CAUSE DATA ALLOCATION ABOVE HHWM

9216806 – HIGH “ENQ: TS – CONTENTION” FOR TEMPORARY SEGMENT WHILE SQLLDR DIRECT PATH LOAD

9242411 – STRESS-BIGBH: LOTS OF OR-3113S IN BIGBH STRESS TEST

9461782 – ORA-7445 [KTSLF_SUMFSG()+54] [SIGSEGV] AND KTSLFSUM_CFS ON CALL STACK

Compression

9011088 – [11GR2]ADDING COLUMN TO COMPRESSED TABLE, DATA LOSS OCCURED.

9275072 – APPSST GSI 11G : BUFFER BUSY WAITS INSERTING INTO TABLES

9341448 – APPSST GSI 11G : BUFFER BUSY WAITS AND LATCH: CACHE BUFFERS WAITS WHEN INSERTING

9637033 – ORA-07445[KDR9IR2RST0] INSERT AS SELECT IN A COMPRESSED TABLE WITH > 255 COLUMNS

SQL Execution

8664189 – ORA-00600 [KDISS_UNCOMPRESS: BUFFER LENGTH]

9119194 – PSRC: DISTRIBUTED QUERY SLOWER IN 10.2.0.4 COMPARED TO 10.2.0.3

Transaction Management

8268775 – PERF: HIGH US ENQUEUE CONTENTION DURING A LOGIN STORM OR SESSION FAILOVER

8803762 – ORA-00600 [KDSGRP1] BLOCK CORRUPTION ON 11G DATABASE UPGRADE

Memory Management

8431487 – INSTANCE CRASH ORA-07445 [KGGHSTFEL()+192] ORA-07445[KGGHSTMAP()+241]

Message

9713537 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-00600

9714832 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-07445

x$ksusecst 内部视图详解

9i 中v$session_wait 是Oracle wait interface的一个主要用户接口,而该动态视图的内容来源于x$ksusecst内部视图:


SQL> select view_definition from v$fixed_view_definition where view_name='GV$SESSION_WAIT';

VIEW_DEFINITION
--------------------------------------------------------------------------------
select s.inst_id,s.indx,s.ksussseq,e.kslednam, e.ksledp1,s.ksussp1,s.ksussp1r,e.
ksledp2, s.ksussp2,s.ksussp2r,e.ksledp3,s.ksussp3,s.ksussp3r, decode(s.ksusstim,
0,0,-1,-1,-2,-2,   decode(round(s.ksusstim/10000),0,-1,round(s.ksusstim/10000)))
, s.ksusewtm, decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME',  -1, '
WAITED SHORT TIME', 'WAITED KNOWN TIME')  from x$ksusecst s, x$ksled e where bit
and(s.ksspaflg,1)!=0 and bitand(s.ksuseflg,1)!=0 and s.ksussseq!=0 and s.ksussop
c=e.indx

SQL> desc x$ksusecst
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(4)
//即 v$session中 saddr 会话的起始地址
 INDX                                               NUMBER
//即 instance_id
 INST_ID                                            NUMBER
//即 sid
 KSSPAFLG                                           NUMBER
 KSUSEFLG                                           NUMBER
//该session是否仍活着, 1 为 alive
 KSUSENUM                                           NUMBER
//另一个固有编号
 KSUSSSEQ                                           NUMBER
// 相当于v$session 视图的SERIAL#列
 KSUSSOPC                                           NUMBER
// 对应x$ksled视图indx列,等待事件列表的一个序列号
 KSUSSP1                                            NUMBER
// 即v$session_wait表的p1列
 KSUSSP1R                                           RAW(4)
// 即v$session_wait表的p1raw
 KSUSSP2                                            NUMBER
// 即v$session_wait表的p2
 KSUSSP2R                                           RAW(4)
// 即v$session_wait表的p2raw
 KSUSSP3                                            NUMBER
// 即v$session_wait表的p3
 KSUSSP3R                                           RAW(4)
// 即v$session_wait表的p3raw
 KSUSSTIM                                           NUMBER
// 即v$session_wait表的wait_time,但单位为微秒
 KSUSEWTM                                           NUMBER
// 即v$session_wait表的seconds_in_wait,单位仍为秒

粗略写了一个可以代替v$session_wait视图的查询语句,过滤了可能出现的空闲等待事件,同时细化wait_time列到us级别:


select s.inst_id,
       s.indx sid,
       s.ksussseq seq#,
       e.kslednam event,
       e.ksledp1 p1text,
       s.ksussp1 p1,
       s.ksussp1r p1raw,
       e.ksledp2 p2text,
       s.ksussp2 p2,
       s.ksussp2r p2raw,
       e.ksledp3 p3text,
       s.ksussp3 p3,
       s.ksussp3r p3raw,
       s.ksusstim wait_time,
       s.ksusewtm seconds_in_wait,
       decode(s.ksusstim,
              0,
              'WAITING',
              -2,
              'WAITED UNKNOWN TIME',
              -1,
              'WAITED SHORT TIME',
              'WAITED KNOWN TIME') state
 from x$ksusecst s, x$ksled e
 where bitand(s.ksspaflg, 1) != 0
   and bitand(s.ksuseflg, 1) != 0
   and s.ksussseq != 0
   and s.ksussopc = e.indx
   and e.kslednam not in ('pmon timer',
                          'VKTM Logical Idle Wait',
                          'VKTM Init Wait for GSGA',
                          'IORM Scheduler Slave Idle Wait',
                          'rdbms ipc message',
                          'i/o slave wait',
                          'VKRM Idle',
                          'wait for unread message on broadcast channel',
                          'wait for unread message on multiple broadcast channels',
                          'class slave wait',
                          'KSV master wait',
                          'PING',
                          'watchdog main loop',
                          'DIAG idle wait',
                          'ges remote message',
                          'gcs remote message',
                          'heartbeat monitor sleep',
                          'SGA: MMAN sleep for component shrink',
                          'MRP redo arrival',
                          'LNS ASYNC archive log',
                          'LNS ASYNC dest activation',
                          'LNS ASYNC end of log',
                          'simulated log write delay',
                          'LGWR real time apply sync',
                          'parallel recovery slave idle wait',
                          'LogMiner builder: idle',
                          'LogMiner builder: branch',
                          'LogMiner preparer: idle',
                          'LogMiner reader: log (idle)',
                          'LogMiner reader: redo (idle)',
                          'LogMiner client: transaction',
                          'LogMiner: other',
                          'LogMiner: activate',
                          'LogMiner: reset',
                          'LogMiner: find session',
                          'LogMiner: internal',
                          'Logical Standby Apply Delay',
                          'parallel recovery coordinator waits for slave cleanup',
                          'parallel recovery control message reply',
                          'parallel recovery slave next change',
                          'PX Deq: Txn Recovery Start',
                          'PX Deq: Txn Recovery Reply',
                          'fbar timer',
                          'smon timer',
                          'PX Deq: Metadata Update',
                          'Space Manager: slave idle wait',
                          'PX Deq: Index Merge Reply',
                          'PX Deq: Index Merge Execute',
                          'PX Deq: Index Merge Close',
                          'PX Deq: kdcph_mai',
                          'PX Deq: kdcphc_ack',
                          'shared server idle wait',
                          'dispatcher timer',
                          'cmon timer',
                          'pool server timer',
                          'JOX Jit Process Sleep',
                          'jobq slave wait',
                          'pipe get',
                          'PX Deque wait',
                          'PX Idle Wait',
                          'PX Deq: Join ACK',
                          'PX Deq Credit: need buffer',
                          'PX Deq Credit: send blkd',
                          'PX Deq: Msg Fragment',
                          'PX Deq: Parse Reply',
                          'PX Deq: Execute Reply',
                          'PX Deq: Execution Msg',
                          'PX Deq: Table Q Normal',
                          'PX Deq: Table Q Sample',
                          'Streams fetch slave: waiting for txns',
                          'Streams: waiting for messages',
                          'Streams capture: waiting for archive log',
                          'single-task message',
                          'SQL*Net message from client',
                          'SQL*Net vector message from client',
                          'SQL*Net vector message from dblink',
                          'PL/SQL lock timer',
                          'Streams AQ: emn coordinator idle wait',
                          'EMON slave idle wait',
                          'Streams AQ: waiting for messages in the queue',
                          'Streams AQ: waiting for time management or cleanup tasks',
                          'Streams AQ: delete acknowledged messages',
                          'Streams AQ: deallocate messages from Streams Pool',
                          'Streams AQ: qmn coordinator idle wait',
                          'Streams AQ: qmn slave idle wait',
                          'Streams AQ: RAC qmn coordinator idle wait',
                          'HS message to agent',
                          'ASM background timer',
                          'auto-sqltune: wait graph update',
                          'WCR: replay client notify',
                          'WCR: replay clock',
                          'WCR: replay paused',
                          'JS external job',
                          'cell worker idle',
                          'SQL*Net message to client');

直接路径读取对于延迟块清除的影响

在Oracle 11g版本中串行的全表扫描可能使用直接路径读取(direct path read)的方式取代之前版本中一直使用的DB FILE SCATTERED READ, 显然direct path read具备更多的优势:

1. 减少了对栓的使用,避免可能的栓争用

2.物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。

当然直接路径读取也会引入一些缺点:

1.在直接路径读取某段前需要对该对象进行一次段级的检查点(A segment checkpoint).

2.可能导致重复的延迟块清除操作(我们假设你了解delayed block cleanout是什么).

metalink 文档[ID 793845.1] 对该新版本中的变化进行了描述:

Applies to:

Oracle Server – Enterprise Edition – Version: 11.1.0.6 to 11.1.0.7
This problem can occur on any platform.

Symptoms

After migrating an 11g database from a standalone to a 4-node RAC,  a noticeable
increase of 'direct path read' waits were observed at times.
Here are the Cache sizes and Top 5 events.
waits

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:     3,232M     3,616M  Std Block Size:         8K
           Shared Pool Size:     6,736M     6,400M      Log Buffer:     8,824K
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                           13,916          42.1
direct path read                  1,637,344      13,359      8   40.4 User I/O
db file sequential read              47,132       1,111     24    3.4 User I/O
DFS lock handle                     301,278       1,028      3    3.1 Other
db file parallel read                14,724         554     38    1.7 User I/O

Changes

Migrated from a standalone database to a 4-node RAC.
Moved from Unix file system storage to ASM.

Using Automatic Shared Memory Management (ASMM).
The setting of db_cache_size in spfile/pfile is low compared to normal workload requirements.

Cause

There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans.
In 10g, serial table scans for “large” tables used to go through cache (by default) which is not the case anymore.  In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats.
Direct path reads are faster than scattered reads and have less impact on other processes because they avoid latches.

Solution

When using Automatic Shared Memory Management (ASMM) and with buffer cache low limit set at a low end compared to the normal workload requirements and usually after startup, 11g might choose to do serial direct path read scans for large tables that do not fit in the SGA. When ASMM increases the buffer cache due to increased demand, 11g might not again do serial direct path read scans for these same large tables.  If you like to avoid this from happening, you should note the buffer cache and share pool requirements for a normal workload and set the low limits of buffer cache and shared pool in spfile/pfile close to these normal workload values.
db_cache_size
shared_pool_size

下面我们对直接路径读取对于延迟块清除造成的影响进行测试:

SQL> create table tv as select rownum rn,rpad('A',600,'Z') rp from dual
2       connect by level <=300000;

表已创建。

新建一个会话a:

SQL> set linesize 200 pagesize 1400;
SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                          27281
25 physical reads                                                 27273
25 physical reads direct                                          27273         
25 cleanouts only - consistent read gets                            0

-- 显然查询采用了直接路径读取方式

SQL> update tv set rn=rn+1;                        -- 尝试批量更新

SQL> alter system flush buffer_cache;             
-- 刷新高速缓存,造成延迟块清除的情景,并提交

系统已更改。

SQL> commit;

提交完成。

新建一个会话b:

SQL> set linesize 200 pagesize 1400;
SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                 54554
25 physical reads                                                        27273
25 physical reads direct                                                 27273
25 redo size                                                                 0
25 cleanouts only - consistent read gets                           27273      
--查询采用direct path read时产生了延迟块清除操作,但不产生redo

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                109104
25 physical reads                                                        54546
25 physical reads direct                                                 54546
25 redo size                                                                 0
25 cleanouts only - consistent read gets                                 54546

再次查询仍采用直接路径读取,产生了相同数目的延迟块清除操作,并没有产生redo;可见direct path read的清除操作仅是针对从磁盘上读取到PGA内存中的镜像,而不对实际的块做任何修改,因而也没有任何redo;

下面我们使用普通串行全表扫描方式,设置event 10949可以避免采用直接路径读取方式.关于该事件可以参见这里.

SQL> ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER';

会话已更改。

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                163662
25 physical reads                                                        81819
25 physical reads direct                                                 54546
25 redo size                                                           1966560
25 cleanouts only - consistent read gets                                 81819

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                190947
25 physical reads                                                        95673
25 physical reads direct                                                 54546
25 redo size                                                           1966560
25 cleanouts only - consistent read gets                                 81819

第一次采用普通全表扫描方式时产生了与direct path read时相同量的延迟块清除操作,并因此产生了大量的redo,这种模式回归到了最经典的延迟块清除情景中;之后的一次读取则不再需要清除块和产生重做了,我们在读取一个“干净”的表段。

从以上测试我们可以了解到,11g中使用更为广泛的direct path read方式对有需要延迟块清除操作的段所可能产生的影响,因为实际没有一个“修改块”的操作,所以虽然延迟块清除操作在该种模式下每次都必须产生,却实际没有产生脏块,因而也就不会有“写块”的必要,故而也没有redo的产生。所产生的负载可能更多的体现在cpu time的使用上。

ORA-00600 [KCBZPB_1], [59033077], [4], [1], [] example

below is the 600 entry in the alertlog:

alert.log:
Hex dump of Absolute File 14, Block 312821 in trace file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc
***
Corrupt block relative dba: 0x0384c5f5 (file 14, block 312821)
Bad header found during preparing block for write
Data in bad block -
type: 6 format: 1 rdba: 0x00000384
last change scn: 0xf90b.c5f55f7c seq: 0x9 flg: 0x72
consistency value in tail: 0x0001f90b
check value in block header: 0x102, block checksum disabled
spare1: 0x6, spare2: 0x2, spare3: 0x0
***
Thu Apr 16 18:32:48 2009
Errors in file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc:
ORA-00600: internal error code, arguments: [kcbzpb_1], [59033077], [4], [1], [], [], [], []
Thu Apr 16 18:32:49 2009
Errors in file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc:
ORA-00600: internal error code, arguments: [kcbzpb_1], [59033077], [4], [1], [], [], [], []
DBW0: terminating instance due to error 600
Instance terminated by DBW0, pid = 10999
Thu Apr 16 19:04:58 2009

After that, We have executed dbverify against the identified file and it produced no errors:

DBVERIFY: Release 9.2.0.8.0 - Production on Thu Apr 16 19:31:32 2009

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


DBVERIFY - Verification starting : FILE = /u32/ORAINDX/oradata/TIGERS7/indx01.dbf

DBVERIFY - Verification complete

Total Pages Examined : 1280000
Total Pages Processed (Data) : 0
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 1262823
Total Pages Failing (Index): 0
Total Pages Processed (Other): 8751
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 8426
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 10386833124905 (2418.1602203177)

we do open a sr ,and oracle support suggest to do below query:
ACTION PLAN
===========
1) please describe the sequence of events leading up to the problem

2) please upload the alert.log. ZIP if >2MB. Dot not use RAR.

3) please describe your backup strategy:
a) when was your last valid backup?
b) are you using RMAN to perform this backup?
c) do you have all archivelogs from the last backup to now?
d) was this a hot or cold backup?

4) even if you’re not using RMAN, run the following in RMAN:
$ rman target /
RMAN> backup validate check logical database;

5) Once RMAN validate is completed, run the following in SQL*Plus as SYSDBA:
SQL> select * from v$database_block_corruption;

6) Please run the following query in SQL*Plus as SYSDBA
— db must be in either MOUNT or OPEN mode
— Save the queries to a file, eg. rec_query1.sql, then run it in SQL*Plus
—————– start ——————
set echo on
set pagesize 2000 linesize 200 trimspool on
col name form a60
col status form a10
col dbname form a15
col member form a60
col inst_id form 999
col resetlogs_time form a25
col created form a25
col DB_UNIQUE_NAME form a15
col stat form 9999999999
col thr form 99999
col “Uptime” form a80

spool rec_query1.out
show user
alter session set nls_date_format=’DD-MM-RR hh24:mi:ss’;

select inst_id, instance_name, status,
to_char(STARTUP_TIME,’dd-Mon-yyyy hh24:mi’) || ‘ – ‘ ||
trunc(SYSDATE-(STARTUP_TIME) ) || ‘ day(s), ‘ ||
trunc(24*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME)))||’ hour(s), ‘ ||
mod(trunc(1440*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME))), 60) ||’ minute(s), ‘ ||
mod(trunc(86400*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME))), 60) ||’ seconds’
“Uptime”
from gv$instance
order by inst_id
/

select dbid, name dbname, open_mode, database_role,
to_char(created,’dd-Mon-YYYY hh24:mi:ss’) created,
to_char(resetlogs_time,’dd-Mon-YYYY hh24:mi:ss’) resetlogs_time
from v$database;

archive log list;

select count(*) from v$backup where status = ‘ACTIVE’;

select * from v$log;
select * from v$logfile;
select * from v$recover_file order by 1;

select distinct(status)from v$datafile;
select FILE#,TS# , status, NAME from v$datafile
where status not in (‘SYSTEM’,’ONLINE’)
order by 1;

select fhsta, count(*)
from X$KCVFH group by fhsta;

select min(fhrba_Seq), max(fhrba_Seq)
from X$KCVFH;
select hxfil FILE#,fhsta STAT,fhscn SCN,
fhthr thr, fhrba_Seq SEQUENCE,fhtnm TABLESPACE
from x$kcvfh order by 1;

7) dump the block. Run the following as SYSDBA in SQL*Plus:
SQL> alter session set max_dump_file_size=unlimited;
SQL> oradebug setmypid;
SQL> alter system dump datafile ‘full pathname for file 14’ block 312821;
SQL> oradebug tracefile_name;

==> upload the said trace file

8) run dbv against datafile 14:
$ dbv file= logfile=

spool off
—————– end ——————

RESEARCH
===============
ORA-600 [4519] “Block Corruption Detected – Cache type wrong”
We found a corrupted block when trying to read a block using
consistent read. An invalid block type was found.
Possible Block Corruption in Memory.

ORA-600 [kcbzpb_1] A block has been read cleanly from disk and updated successfully by the
clients of the cache layer.
Before the cache layer writes the block back to disk it does a health
check on the cache header.
If requested to do so (default), it generates a checksum for the block.
The health check is failing.
MEMORY CORRUPTION

ORA-600 [kcbzpb_1] was raised because DBA 59033077 => 14,312821 was found corrupted when read in the cache before we writ eit to disk.
Alert.log shows same block as corrupted, BAD HEADER, meaning blocks was overwriten.
Now DBV doesn’t show any corruption in file 14.

ACTION PLAN
====================

Hi,

I reviewed the information and the crash was caused by in memory corruption.
If restarted your database should be fine.

RESEARCH
================
Db crashed with ORA-600 [KCBZPB_1] because of corrupted block in memory:

STACK: kcbbxsv kcbbwlru kcbbdrv ksbabs ksbrdp

Bug.5866883/5845232 (36) INSTANCE GOES DOWN DUE TO ORA-600 [KCBZPB_1] V9208:
Bug.5845843/5845232 (96) DATABASE CRASH BY ORA-00600 [2032] , ORA-00600 [KCBZPB_1]

Bug:5845232: Block corruption / errors from concurrent dequeue operations
Tags: AQ CORR/PHY DUMP OERI R9208 REGRESSION SUPERCEEDED
Details:
This problem is introduced in 9.2.0.8 by the fix for bug 4144683.
Concurrent dequeue operations can lead to block corruption
/ memory corruption with varying symptoms such as ORA-600 [6033],
ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled.
The fix for this bug is Patch 6401576.

Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH
Abstract: OERI[ktbair1] / ORA-600 [6101] index corruption possible
Fixed-Releases: WIN:9208P22
Tags: CORR/IND OERI
Details:
Note: This fix replaces the fix in bug 5845232.
Certain index operations can lead to block corruption
/ memory corruption with varying symptoms such as ORA-600 [6033],
ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1],
ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled.

ISSUE CLARIFICATION
====================
Db crashed with ORA-600 [KCBZPB_1]

ISSUE VERIFICATION
===================
alert.log and trace file

CAUSE DETERMINATION
======================
in memory corruption

CAUSE JUSTIFICATION
====================
Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH

POTENTIAL SOLUTION(S)
======================
apply patch for Bug:6401576

POTENTIAL SOLUTION JUSTIFICATION(S)
====================================
to fi x the issue

SOLUTION / ACTION PLAN
=======================

Hi,

These errors looks very similar to Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH

Please download and apply one-off patch for Bug:6401576 from

Metalink->Patches->patch#=6401576 ->Platform=Hp_UX

Thanks, Rodica

ORA-600 [kddummy_blkchk] [18038] 一例

一位客户的Oracle告警日志中出现了ORA-600 [kddummy_blkchk] [18038]故障,alert中的具体信息:

Errors in file /u01/app/oracle/admin/prdw014a/udump/prdw014a_ora_4377.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [222], [5792], [18038], [], [], [], []
Mon May 17 15:27:53 2010
Trace dumping is performing id=[cdmp_20100517152753]
Mon May 17 15:27:53 2010
Doing block recovery for file 2 block 504365
Block recovery from logseq 159276, block 166357 to scn 10934615778284
Mon May 17 15:27:53 2010
Recovery of Online Redo Log: Thread 1 Group 4 Seq 159276 Reading mem 0
Mem# 0: /u01/app/oracle/dataPRDW014/redo04a_1.log
Mem# 1: /u01/app/oracle/dataPRDW014/redo04a_2.log
Block recovery completed at rba 159276.167277.16, scn 2545.3924010007
Doing block recovery for file 222 block 5792
Block recovery from logseq 159276, block 84741 to scn 10934615778283
Mon May 17 15:27:53 2010
Recovery of Online Redo Log: Thread 1 Group 4 Seq 159276 Reading mem 0
Mem# 0: /u01/app/oracle/dataPRDW014/redo04a_1.log
Mem# 1: /u01/app/oracle/dataPRDW014/redo04a_2.log
Block recovery completed at rba 159276.167277.16, scn 2545.3924009964
Mon May 17 15:27:55 2010
Block recovery completed at rba 159276.167277.16, scn 2545.3924009964
Mon May 17 15:27:55 2010
Corrupt Block Found
TSN = 67, TSNAME = OBA_DATA
RFN = 222, BLK = 5792, RDBA = 931141280
OBJN = 1657288, OBJD = 1699775, OBJECT = W_ORG_DS, SUBOBJECT =
SEGMENT OWNER = BMS_OBA_DW, SEGMENT TYPE = Table Segment
Mon May 17 15:32:56 2010
Trace dumping is performing id=[cdmp_20100517153255]

附600错误产生的trace信息:
prdw014a_ora_4377.trc

/u01/app/oracle/admin/prdw014a/udump/prdw014a_ora_4377.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining Scoring Engine
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/102prdw014
System name: SunOS
Node name: v08k405
Release: 5.9
Version: Generic_122300-29
Machine: sun4u
Instance name: prdw014a
Redo thread mounted by this instance: 1
Oracle process number: 109
Unix process pid: 4377, image: oracle@v08k405

*** 2010-05-17 15:23:15.391
*** ACTION NAME:() 2010-05-17 15:23:15.389
*** MODULE NAME:(pmdtm@v04k413 (TNS V1-V3)) 2010-05-17 15:23:15.389
*** SERVICE NAME:(prdw014_taf) 2010-05-17 15:23:15.389
*** SESSION ID:(789.48811) 2010-05-17 15:23:15.389
TYP:0 CLS: 4 AFN:222 DBA:0x378016a0 OBJ:1699775 SCN:0x09f1.e9e3a3eb SEQ: 2 OP:14.4
kteop redo - redo operation on extent map
RESIZE: entry:0 delta:

...
..
..
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kddummy_blkchk], [222], [5792], [18038], [], [], [], []
Current SQL statement for this session:
INSERT /*+ SYS_DL_CURSOR */ INTO bms_oba_dw.W_ORG_DS ("W_CUSTOMER_CLASS","NAME","ST_ADDRESS","CITY","STATE","ZIPCODE","COUNTRY","CUST_TYPE_CODE","CUST_TYPE_NAME","ACTIVE_FLG","DOM_ULT_DUNS_NUM","DUNS_NUM","EMP_COUNT","FORMED_DT","GLBLULT_DUNS_NUM","ANNUAL_REVENUE","BRANCH_FLG","BIRTH_DT","NO_OF_CHILDREN","LEGAL_NAME","FAMILY_NAME","OTHER_NAME","PREFERRED_NAME","INDV_ADDNL_TITLE","INDV_TITLE","INDV_MARITAL_STATE","INDV_GENDER","EMAIL_ADDRESS","RELATIONSHIP_STATE","INDV_EMP_STATUS","FAX_NUM","PAGER_NUM","MOBILE_NUM","LIFE_CYCLE_STATE","CUST_CAT_CODE","CUST_CAT_NAME","SIC_CODE","SIC_NAME","GOVT_ID_TYPE","GOVT_ID_VALUE","DUNNS_SITE_NAME","DUNNS_GLOBAL_NAME","DUNNS_LEGAL_NAME","CUSTOMER_NUM","ALT_CUSTOMER_NUM","ALT_PHONE_NUM","INTERNET_HOME_PAGE","LEGAL_STRUCT_CODE","LEGAL_STRUCT_NAME","DIRECT_MKTG_FLG","SOLICITATION_FLG","CUSTOMER_HIER1_CODE","CUSTOMER_HIER1_NAME","CUSTOMER_HIER2_CODE","CUSTOMER_HIER2_NAME","CUSTOMER_HIER3_CODE","CUSTOMER_HIER3_NAME","CUSTOMER_HIER4_CODE","CUSTOMER_HIER4_NAME","CUSTOMER_HIER5_CODE","CUSTOMER_HIER5_NAME","CUSTOMER_HIER6_CODE","CREATED_BY_ID","CHANGED_BY_ID","CREATED_ON_DT","CHANGED_ON_DT","AUX1_CHANGED_ON_DT","AUX2_CHANGED_ON_DT","AUX3_CHANGED_ON_DT","AUX4_CHANGED_ON_DT","SRC_EFF_FROM_DT","SRC_EFF_TO_DT","DELETE_FLG","DATASOURCE_NUM_ID","INTEGRATION_ID","TENANT_ID","X_CUSTOM","MOT_ATTRIBUTE1","MOT_ATTRIBUTE2","MOT_ATTRIBUTE3","MOT_ATTRIBUTE4","MOT_ATTRIBUTE5","MOT_ATTRIBUTE6","MOT_ATTRIBUTE7","MOT_ATTRIBUTE8","MOT_ATTRIBUTE9","MOT_ATTRIBUTE10","MOT_ATTRIBUTE11","MOT_ATTRIBUTE12","MOT_ATTRIBUTE13","MOT_ATTRIBUTE14","MOT_ATTRIBUTE15","MOT_ATTRIBUTE16","MOT_ATTRIBUTE17","MOT_ATTRIBUTE18","MOT_ATTRIBUTE19","MOT_ATTRIBUTE20","MOT_PARTY_TYPE","MOT_PHONE_AREA_CODE","MOT_ORIG_SYSTEM_REFERENCE","MOT_PER_EMAIL_ADDR","MOT_PERSON_FIRST_NAME","MOT_PHONE_EXTENSION","MOT_ALTERNATE_NAME","MOT_TELEPHONE_TYPE","MOT_SALES_CHANNEL_CODE","MOT_ACCOUNT_NAME","MOT_ATTRIBUTE_CATEGORY","MOT_INTERCOMPANY_FLAG","MOT_PARTY_NUMBER","MOT_PARTY_ID","MOT_LAST_UPDATE_LOGIN","MOT_CUST_CLASS_DESC","MOT_RECEIPT_METHOD_NAME","MOT_PHONE_NUMBER","MOT_CONTACT_POINT_PURPOSE","MOT_SALESREP_NAME","MOT_PAY_TERMS_CODE","MOT_PAY_TERMS_NAME") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+744         CALL     ksedst()             000000840 ?
                                                   FFFFFFFF7FFF620C ?
                                                   000000000 ?
                                                   FFFFFFFF7FFF2D00 ?
                                                   FFFFFFFF7FFF1A68 ?
                                                   FFFFFFFF7FFF2468 ?
kgerinv()+200        PTR_CALL 0000000000000000     000106800 ? 10681C1C4 ?
                                                   10681C000 ? 00010681C ?
                                                   000106800 ? 10681C1C4 ?
kseinpre()+96        CALL     kgerinv()            106816B18 ? 000000000 ?
                                                   1064564C0 ? 000000003 ?
                                                   FFFFFFFF7FFF6750 ?
                                                   000001430 ?
ksesin()+52          CALL     kseinpre()           000106800 ? 000000003 ?
                                                   00000025F ? 10681C1B8 ?
                                                   FFFFFFFF7FFF6750 ?
                                                   1068167D8 ?
kco_blkchk()+2568    CALL     ksesin()             1064564C0 ? 000000003 ?
                                                   000106800 ? 0000000DE ?
                                                   000000000 ? 000106800 ?
kcoapl()+1284        CALL     kco_blkchk()         0001900DE ? 0378016A0 ?
                                                   0000016A0 ? 00000FC00 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFF89F8 ?
kcbapl()+412         CALL     kcoapl()             000000002 ? 000002300 ?
                                                   000105800 ? 583DBC000 ?
                                                   106816C98 ? 00010598F ?
kcrfw_redo_gen()+16  CALL     kcbapl()             FFFFFFFF7FFF89B8 ?
376                                                583FB7870 ?
                                                   FFFFFFFF7AF3AA3C ?
                                                   B6E9FABD0 ? 000000000 ?
                                                   583DBC000 ?
kcbchg1_main()+1363  CALL     kcrfw_redo_gen()     000000000 ?
2                                                  FFFFFFFF7FFF76C8 ?
                                                   B693A9998 ? 000000000 ?
                                                   3800135A0 ?
                                                   FFFFFFFF7FFF7700 ?
kcbchg1()+1324       CALL     kcbchg1_main()       000100C00 ?
                                                   FFFFFFFF7FFF7850 ?
                                                   000000000 ? 583FB7870 ?
                                                   000000000 ? 00000FFFF ?
ktuchg()+968         CALL     kcbchg1()            000106819 ? 1068195B8 ?
                                                   1068195C8 ? 106819000 ?
                                                   000000000 ? 106819000 ?
ktbchg2nt()+104      CALL     ktuchg()             000000002 ? 000000001 ?
                                                   FFFFFFFF7FFF8928 ?
                                                   B67A76DD8 ? 000000000 ?
                                                   000000000 ?
kteopgen()+728       CALL     ktbchg2nt()          FFFFFFFF7FFF89B8 ?
                                                   FFFFFFFF7FFF87C4 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFF7FFF8928 ?
                                                   FFFFFFFF7FFF9D98 ?
kteopresize()+2276   CALL     kteopgen()           FFFFFFFF7FFF89B8 ?
                                                   000000006 ? 000106800 ?
                                                   000000002 ? 10682247C ?
                                                   106816B18 ?
ktsxbmdelext1()+968  CALL     kteopresize()        FFFFFFFF7FFF9D98 ?
8                                                  FFFFFFFF7FFF9E88 ?
                                                   000000004 ? 000000002 ?
                                                   000000000 ? 000000000 ?
ktsstrm_segment()+6  CALL     ktsxbmdelext1()      FFFFFFFF7AD33A78 ?
308                                                0000016A0 ? 0003FFFFF ?
                                                   FFFFFFFF7AD33A78 ?
                                                   106822000 ? 000000043 ?
ktsmg_trimf()+1208   CALL     ktsstrm_segment()    000000000 ? 000000003 ?
                                                   000000001 ? 000100C00 ?
                                                   106819000 ? 000000000 ?
kdbltrmt()+1916      CALL     ktsmg_trimf()        00010598F ? 0000010E2 ?
                                                   106822478 ? 000000005 ?
                                                   10682247C ? 106816B18 ?
kdblfpl()+96         CALL     kdbltrmt()           000000006 ? 000000000 ?
                                                   FFFFFFFF7AD33918 ?
                                                   000000180 ? 0000010E4 ?
                                                   000000008 ?
kdblfl()+1948        CALL     kdblfpl()            FFFFFFFF7FFFB0AC ?
                                                   FFFFFFFF7AD33918 ?
                                                   000000000 ?
                                                   FFFFFFFF7AD33AE0 ?
                                                   FFFFFFFF7AD33A68 ?
                                                   000000000 ?
klafin()+160         CALL     kdblfl()             FFFFFFFF7FFFB0AC ?
                                                   FFFFFFFF7AD33918 ?
                                                   000000000 ? 000000001 ?
                                                   000000008 ? 000106800 ?
kpodpfin()+76        CALL     klafin()             FFFFFFFF7AF35C40 ?
                                                   1059BF2B8 ? 000000321 ?
                                                   FFFFFFFF7AD33918 ?
                                                   000000000 ? 000400000 ?
kpodpmop()+320       CALL     kpodpfin()           FFFFFFFF7AF35C40 ?
                                                   000106816 ? 000106800 ?
                                                   000000321 ? 000000001 ?
                                                   FFFFFFFF7AF35BC8 ?
opiodr()+1496        PTR_CALL 0000000000000000     000000301 ? 000000321 ?

进过与Oracle support确认,定位为Bug 5386204 – Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment [ID 5386204.8].
“kteop redo – redo operation on extent map” 记录是确定该Bug的一个重要依据。

该Bug的Oracle note:

Bug 5386204  Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment
This note gives a brief overview of bug 5386204.
The content was last updated on: 08-FEB-2010
Click here for details of each of the sections below.
This bug is alerted in Note:580561.1
Affects:

Product (Component)    Oracle Server (Rdbms)
Range of versions believed to be affected    Versions < 11
Versions confirmed as being affected

* 9.2.0.8
* 10.2.0.1
* 10.2.0.2
* 10.2.0.3
* 10.2.0.4

Platforms affected    Generic (all / most platforms affected)

Fixed:

This issue is fixed in

* 9.2.0.8 Patch 15 on Windows Platforms
* 10.2.0.2 Patch 15 on Windows Platforms
* 10.2.0.3 Patch 5 on Windows Platforms
* 10.2.0.4.1 (Patch Set Update)
* 10.2.0.4 Patch 2 on Windows Platforms
* 10.2.0.5 (Server Patch Set)
* 11.1.0.6 (Base Release)

Symptoms:

Related To:

* Internal Error May Occur (ORA-600)
* Corruption (Logical)
* ORA-600 [kddummy_blkchk]

* Direct Path Operations
* ASSM Space Management (Bitmap Managed Segments)

Description

Block corruption / ORA-600 [kddummy_blkchk][file#] [block#] [18038]
can occur on a segment which has been direct loaded.

(The corruption shows as a PAGETABLE SEGMENT HEADER
having blocks in the “Auxillary Map” outside of the “Extent Map”
range)

Note:
This bug was previously incorrectly listed as fixed in 10.2.0.4

Further details on this issue can be found in Note:580561.1
ORA-600 [kddummy_blkchk][][][18038] during extent operations like TRUNCATE on ASSM tablespaces [ID 580561.1]

Applies to:
Oracle Server – Enterprise Edition – Version: 9.2.0.8 to 10.2.0.4
Information in this document applies to any platform.
Description

This alert describes the problem in Bug 5386204 / Note 5386204.8.

Block corruption with error ORA-600 [kddummy_blkchk] [file#] [block#] [18038]
may be reported during a DROP/TRUNCATE

The corruption shows as a PAGETABLE SEGMENT HEADER having blocks in the
“Auxillary Map” outside of the “Extent Map” range.

The same operation terminated without any error in previous RDBMS versions
like Oracle9i.

Likelihood of Occurrence

The object is populated by direct path operations such as SQL*Loader using DIRECT=Y for example.
The object is stored in a Locally Managed Tablespace (LMT) that is using ASSM (dba_tablespaces.segment_space_management=’AUTO’).
Bug 5386204 is mostly hit when db_block_size=16384.

Possible Symptoms

One evidence of hitting this bug might be the value 18038 in the third argument of
ORA-600 [kddummy_blkchk] where [18038] is a check error code.

@Error check code 18038 means that the “Data dba” stored in “Auxiliary Map” is out of range

@TYP:0 CLS: 4 AFN:234 DBA:0x3a801554 OBJ:0 SCN:0x000b.290f5e0d SEQ:  1 OP:14.2

@In this case “Data dba: 0x3a801555” stored in the “Auxiliary Map” is equal to 0x3a801551 + 4 which is out of the extent 0, hence the error.
@Note that extent 0 is 4 blocks, so extent 0 starts from 0x3a801551 to 0x3a801554.

Workaround or Resolution

In order to identify objects that are affected by the corruption, use the procedure
DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY

@DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY is also an option but it requires patch for Bug 6760697 is needed)

How to execute DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY:

alter system set DB_BLOCK_CHECKSUM = OFF;
— open a new session and run :
exec DBMS_SPACE_ADMIN.assm_tablespace_verify(‘<Tablespace Name>’, DBMS_SPACE_ADMIN.TS_VERIFY_DEEP, DBMS_SPACE_ADMIN.SEGMENT_VERIFY_DEEP);

See if any trace file is generated in the directory defined by user_dump_dest.
The absence of a trace file means that no corrupt segments were found.

Note: DB_BLOCK_CHECKSUM has to be disabled; otherwise the same ORA-600 error may be produced

@Oracle check block type 0x23=PAGETABLE SEGMENT HEADER even if DB_BLOCK_CHECKING is not set.

Example of output from DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY

Segment header [dba: 0x003a801554, (file 234,block 5460)]
Segment object id: 7825838; inc. no.: 0
*********

verifying extent map and tablespace bitmap consistency
———
Verifying extent map and  auxilliary extent map   consistency in   the segment
Block Corruption in seg hdr / ext map block:                rdba: 0x3a801554, err code: 18038

Identifying the object using the segment header information.

Segment header [dba: 0x003a801554, (file 234,block 5460)]

select *
from DBA_EXTENTS
where FILE_ID = 234
and 5460 between block_id and block_id + blocks – 1;

Identifying the object using the Segment object id information.

Segment object id: 7825838; inc. no.: 0

select *
from DBA_OBJECTS
where DATA_OBJECT_ID = 7825838;

@How to execute DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY

WORKAROUNDs:

Disable DB_BLOCK_CHECKSUM for any action taken.

Note: DB_BLOCK_CHECKSUM has to be disabled; otherwise the same ORA-600 error may be produced

alter system set DB_BLOCK_CHECKSUM = OFF;
— open a new session

DROP TABLE .. PURGE;
ALTER TABLE .. MOVE ..;
Create table as select (CTAS)
export/import, etc

Patches

The patch prevents the corruption from taking place. Affected objects will have to be recreated.

This bug was previously incorrectly listed as fixed in 10.2.0.4.

@This problem is fixed in the 10.2.0.5 Patch Set (not available yet and still without a due date).
This problem is fixed in the 11.1.0.6 rdbms release.

One off patches for this issue are available for some platforms / versions.

See Patch 5386204 for patch availability.

Modification History

03-JUN-2008 – Initial Alert version
04-JUN-2008 – Implemented correction
11-JUN-2008 – Added info about DB_BLOCK_CHECKSUM
13-JUN-2008 – Published

References

BUG:5386204 – ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038

NOTE:5386204.8 – Bug 5386204 – Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment

Bug 5386204: ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038

Show Bug Attributes Bug Attributes
Type     B – Defect     Fixed in Product Version     11.1
Severity     1 – Complete Loss of Service     Product Version     10.2.0.2
Status     80 – Development to Q/A     Platform     226 – Linux x86-64
Created     12-Jul-2006     Platform Version     2.6.5-7.191-SMP
Updated     20-May-2010     Base Bug     –
Database Version     10.2.0.2
Affects Platforms     Generic
Product Source     Oracle

Show Related Products Related Products
Line     Oracle Database Products     Family     Oracle Database
Area     Oracle Database     Product     5 – Oracle Server – Enterprise Edition

Hdr: 5386204 10.2.0.2 RDBMS 10.2.0.2 SPACE PRODID-5 PORTID-226 ORA-600
Abstract: ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038

*** 07/12/06 12:59 am ***
TAR:
—-

PROBLEM:
——–
1. Clear description of the problem encountered
Customer is getting repeated ORA-600 [kddummy_blkchk] errors reported with
internal check code 18038 on tables which have had bulk deletions made.  This
has occurred on both production and test instances.

2. Pertinent configuration information (MTS/OPS/distributed/etc)
RAC, ASM

3. Indication of the frequency and predictability of the problem
Problem is intermittent but occurs several times a day impacting the
customers ability to work.

4. Sequence of events leading to the problem
Error is typically signalled on a COMMIT most likely following a deletion
from the tables.

5. Technical impact on the customer. Include persistent after effects.
Severe, as it occurs multiple times per day, and corrupt the underlying
tables preventing further data loads.

DIAGNOSTIC ANALYSIS:
——————–
The trace files show that the problem occurs following a bulk deletion from
the underlying tables, which appear to corrupt the extent map, as the segment
header dump shows 1 extent of 4 blks, but the deleteion entry in the redo
stream shows one extent of 8 blks, e.g.:

REDO RECORD – Thread:1 RBA: 0x0005da.000e5e34.01c0 LEN: 0x00fc VLD: 0x01
SCN: 0x000d.37eacce9 SUBSCN:  5 07/11/2006 10:29:53
CHANGE #1 TYP:0 CLS:60 AFN:39 DBA:0x09c322e0 OBJ:4294967295
SCN:0x000d.37eacce9 SEQ:  2 OP:5.1
ktudb redo: siz: 112 spc: 15940 flg: 0x0022 seq: 0x011d rec: 0x06
xid:  0x0016.020.000005b6
ktubu redo: slt: 32 rci: 5 opc: 14.5 objn: 2 objd: 93662 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  Yes
0x00000000
kteopu undo – undo operation on extent map
segdba: 0x87e3cc  class: 4  mapdba:0x87e3cc  offset: 3
rbr extent – dba: 0x0  nbk: 0x0
kteop redo – redo operation on extent map
ADD: dba:0x803673d len:8 at offset:1
DEFAULT: ???
SETSTAT: exts:2 blks:16 lastmap:0x0 mapcnt:0
CHANGE #2 TYP:0 CLS: 4 AFN:2 DBA:0x0087e3cc OBJ:93662 SCN:0x000d.37eacce9
SEQ:  1 OP:14.4
kteop redo – redo operation on extent map
DELETE: entry:1
shift back: dba:0x0 len:0
SETSTAT: exts:1 blks:8 lastmap:0x0 mapcnt:0

WORKAROUND:
———–
None

RELATED BUGS:
————-
Bug 4949123 – ORA-600: [KDDUMMY_BLKCHK], [541], [147050], [18038]

REPRODUCIBILITY:
—————-
Consistently occurring at customers site.

TEST CASE:
———-
n/a

STACK TRACE:
————
ksedst ksedmp ksfdmp kgerinv kseinpre ksesin kco_blkchk kcoapl kcbapl
kcrfw_redo_gen kcbchg1_main kcbchg1 ktuchg ktbchg2nt kteopgen kteopresize
ktsxbmdelext1 ktsstrm_segment ktsmg_icmt_prepare ktcifc ktucmt ktpcmt ktcrcm
ktdcmt k2lcom k2send xctctl xctcom_with_options kksExecuteCommand opiexe
opipls opiodr rpidrus skgmstack rpidru rpiswu2 rpidrv psddr0 psdnal
pevm_EXECC pfrinstr_EXECC pfrrun_no_tool pfrrun plsql_run peicnt kkxexe
opiexe kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real
main __libc_start_main _start

SUPPORTING INFORMATION:
———————–
alertlogs and trace files

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
n/a

DIAL-IN INFORMATION:
——————–
n/a

IMPACT DATE:
————
21-JUL-2006

*** 07/12/06 02:34 am *** (CHG: Asg->NEW OWNER OWNER)

A redo dump of the segment header during the entire procedure execution was
requested on 06 Aug and supplied on 09 Aug so why are you asking for this
information again when you already have it?  Please check that file
(redo_1.trc in bug5386204_07Aug.zip), and let me know if you need anything
else.
*** 09/19/06 02:39 am *** (CHG: Sta->30)
Uploaded the requested information in file bug5386204_Oct02.zip.

*** 11/27/06 11:13 am ***
Here is one theory we (space group) have on this bug so far:
During direct load one of the segments does not get loaded with any data. The
segment is empty and the first extent has 8 blocks (this is 16k block size).
However it goes through the usual high water mark movement phase (even though
the hwm does not move). During the hwm movement phase, the segment is trimmed
close to 64k boundary. For ASSM segment with 16k block size, this means the
segment will be left with no data blocks after the trim- 4 blocks after the
trim would represent bitmaps and segment header only.

There are two issues here:
(1) Why was ktsstrm_segment called on an empty (or unloaded) segment at first
place?
(2) Even if it was called, why is segment trimmed to 64k boundary?

I’m working on the 2nd issue and will give an update soon.
*** 11/29/06 03:18 am *** (CHG: Pri->1)
*** 11/29/06 03:18 am ***
*** 11/29/06 03:25 am *** -> CLOSED
*** 11/29/06 05:31 pm ***
*** 11/30/06 10:36 pm ***
We ran into some issues (bugs) while testing the code for the diagnostic
patch. I was hoping to have it finished by today but it seems it’ll take some
more time and I’m pretty hopeful of having it ready to go by tomorrow evening
(PST). I’m really sorry for the delay.
*** 12/01/06 07:16 pm ***
*** 12/02/06 05:05 pm ***
Sorry for the delay in replying. I would expect the long regressions to be
complete by sunday afternoon PST. I should be ready to release the patch by
sunday evening if things go fine. Will keep this page updated on my progress.
*** 12/03/06 05:30 pm ***
*** 12/04/06 05:35 pm ***
It seems most of major issues with the long regressions have been taken care
of and I hope to get a clean run on the farm soon, by tomorrow end of day and
the patch should be on its way soonafter.

I had a question though, that will help me in getting the patch out faster. I
wanted to know if the customer has had any diagnosibility patches installed
on their 10.2.0.2.0 release version.

Another thing which I would like to mention here is that my patch modifies
only one file (ktss.c) in the RDBMS code.
*** 12/05/06 02:19 am ***
*** 12/05/06 04:42 pm ***
*** 12/05/06 05:06 pm ***
I was hoping to have all the farm regressions (and the patch) done by today
evening but it seems farm is taking a bit long to finish the regressions.
I’ll work on the patch as soon as I have the regressions done. Sorry for the
delay. I’ll provide an update on that in the next few hours.
*** 12/05/06 09:07 pm ***
My regressions are still moving very slowly through the queue on the farm.
The farm seems to be busy with 11g Beta 4 deadline round the corner. My
regressions have been on the farm for more than a day now. I’ll work on the
patch as soon as I have a clean farm run.
*** 12/06/06 06:15 pm ***
Still waiting on the clean farm runs. Fortunately, I’ve been able to get a
high priority on the farm jobs. So, I expect things to run clean soon. Will
keep things updated here.
*** 12/07/06 05:54 pm ***
Got my farm runs completed last night but got a small number of diffs. Have
been trying to isolate them and hopefully soon, everything should be clean.
Farm has been giving those diffs over and over again though those look
unrelated to my change. Currently, verifying them on my linux workstation.
*** 12/08/06 06:01 pm ***
*** 12/08/06 06:23 pm ***
Have been able to run almost all the long regressions locally and things look
clean. There’s just a couple of long regressions which I’m still running and
I should expect to be ready to go as soon as they are completed. Should be
able to start the patch building soon.
*** 12/11/06 01:07 am ***
There’s one long regression which seems to be broken. I’m currently working
on that to have it run clean. Will update as soon as I have it running clean.
*** 12/11/06 01:23 pm ***
Everything is clean now. Working on starting the patch building process.
*** 12/11/06 02:46 pm ***

The customer has confirmed that following application of the suplied patch
the error no longer occurred when running the testcase, which ran through to
completion after about 8 hours.  They are resetting the testcase, and will
run it again to verify this, but the initial response is that this looks to
have resolved the problem.

Can you confirm if the patch would need to be rebuilt as a permananent fix,
ie. any diagnostics to be removed etc. or is it actually the full fix anyway?
*** 12/13/06 07:19 am ***
The customer has confirmed the following:

1. Rerun the test for the 2nd time with patched rdbms: completed quickly and
without any problems.
2. Rollbacked the patch: the test failed as expected within 30 minutes.
3. Re-applied the patch and ran the test once again: completed ok.

This appears to confirm that the patch resolves the problem so could we have
an answer to the previous update?
*** 12/13/06 07:47 pm ***
That is good news.
No additional diagnostics have been added to the patch. So, it’s not needed
to be rebuilt. I guess the supplied patch should be complete in itself.
*** 12/14/06 12:40 am ***
Thanks for the update.

该文档描述当使用直接路径方式导入数据时一定概率导致该Bug产生,譬如使用Sql loader且DIRECT=Y;
该Bug只会由存贮在本地管理方式(LTM)并自动段管理(ASSM)的对象引发, 并且当标准块大小为16k时出现概率较高(Bug 5386204 is mostly hit when db_block_size=16384.)
一般数据库都会启用db_block_checksum,该参数控制Oracle在读入块时做检验操作,[18038]是kddummy_blkchk的一种错误代码,出现该错误代码说明存储在段头中的辅助区间图中的Data dba越界, 我们举一个段头来看:

Start dump data blocks tsn: 4 file#: 4 minblk 139 maxblk 139
buffer tsn: 4 rdba: 0x0100008b (4/139)
scn: 0x0000.000f327e seq: 0x01 flg: 0x04 tail: 0x327e2301
frmt: 0x02 chkval: 0x619e type: 0x23=PAGETABLE SEGMENT HEADER
Hex dump of block: st=0, typ_found=1
.......
  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 9      #blocks: 72
                  last map  0x00000000  #maps: 0      offset: 2716
      Highwater::  0x0101e1f1  ext#: 8      blk#: 8      ext size: 8
  #blocks in seg. hdr's freelists: 0
  #blocks below: 65
  mapblk  0x00000000  offset: 8
                   Unlocked
  --------------------------------------------------------
  Low HighWater Mark :
      Highwater::  0x0101e1f1  ext#: 8      blk#: 8      ext size: 8
  #blocks in seg. hdr's freelists: 0
  #blocks below: 65
  mapblk  0x00000000  offset: 8
  Level 1 BMB for High HWM block: 0x0101e1e9
  Level 1 BMB for Low HWM block: 0x0101e1e9
  --------------------------------------------------------
  Segment Type: 1 nl2: 1      blksz: 8192   fbsz: 0
  L2 Array start offset:  0x00001434
  First Level 3 BMB:  0x00000000
  L2 Hint for inserts:  0x0100008a
  Last Level 1 BMB:  0x0101e1e9
  Last Level II BMB:  0x0100008a
  Last Level III BMB:  0x00000000
     Map Header:: next  0x00000000  #extents: 9    obj#: 51806  flag: 0x10000000
  Inc # 0
  Extent Map
  -----------------------------------------------------------------
   0x01000089  length: 8
   0x0101e1a1  length: 8
   0x0101e1a9  length: 8
   0x0101e1b9  length: 8
   0x0101e1c1  length: 8
   0x0101e1c9  length: 8
   0x0101e1d9  length: 8
   0x0101e1e1  length: 8
   0x0101e1e9  length: 8

  Auxillary Map
  --------------------------------------------------------
   Extent 0     :  L1 dba:  0x01000089 Data dba:  0x0100008c
   Extent 1     :  L1 dba:  0x01000089 Data dba:  0x0101e1a1
   Extent 2     :  L1 dba:  0x0101e1a9 Data dba:  0x0101e1aa
   Extent 3     :  L1 dba:  0x0101e1a9 Data dba:  0x0101e1b9
   Extent 4     :  L1 dba:  0x0101e1c1 Data dba:  0x0101e1c2
   Extent 5     :  L1 dba:  0x0101e1c1 Data dba:  0x0101e1c9
   Extent 6     :  L1 dba:  0x0101e1d9 Data dba:  0x0101e1da
   Extent 7     :  L1 dba:  0x0101e1d9 Data dba:  0x0101e1e1
   Extent 8     :  L1 dba:  0x0101e1e9 Data dba:  0x0101e1ea
  --------------------------------------------------------

   Second Level Bitmap block DBAs
   --------------------------------------------------------
   DBA 1:   0x0100008a

其中辅助区间图( Auxillary Map)列出了该段每个区间(Extent)的一级位图块以及区间中实际数据开始的data block address (Data dba).譬如Extent 0 中的Data dba应在
(0x0100008A ~0x01000090)之间,否则即越界。
DROP或TRUNCATE是触发该Bug的主要操作,原因是这2个操作都需要使用到Pagetable segment header中的Auxiliary Map。
Oracle建议的WorkAround方式主要是通过MOVE TABLESPACE 来”REBUILD”这个PAGETABLE SEGMENT HEADER。
这个Case中Oracle support给出Workaround建议:

1-. Make sure the below query will return the table mentioned above:

SQL> select owner, object_name, object_type, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, CREATED,LAST_DDL_TIME,TIMESTAMP
from DBA_OBJECTS
where DATA_OBJECT_ID =1699775;

If so continue:

SQL>alter system set DB_BLOCK_CHECKSUM = OFF;

Find all indexes for W_ORG_DS table.

SQL> select owner, index_name, index_type, table_name , table_owner from dba_indexes
Where table_owner = ‘BMS_OBA_DW’ and
Table_name = ‘W_ORG_DS’;

connect as BMS_OBA_DW

SQL> desc W_ORG_DS

if this table does not have LONG column, then Alter table table_name move is like a CTAS but better since is using the same name of the object plus keeping any related object like index, etc. If it has Long column then export/truncate/import need to be use;

SQL>Alter table W_ORG_DS Move;

Then rebuild all indexes for W_ORG_DS table as per above query: .i.e.

SQL>Alter index rebuild

To avoid problem, please apply patch for bug 5386204, see note 580561.1 for further information.

Oracle文档宣称其已在10.2.0.4的第一个patch set update及10.2.0.5中修复了该Bug.

注:最早认为该Bug在10.2.0.4中就已经修复了,但后来确认“This bug was previously incorrectly listed as fixed in 10.2.0.4”。

AIX操作系统启动详细介绍

RS/6000的引导检测过程与机型有一定关系,这里介绍大多数机型共同的启动过程,可以选择不同的系统运行模式:可以从磁带或CD-ROM引导系统进入到维护模式(单用户模式);也可以从硬盘引导系统进入到维护模式(单用户模式)或正常模式(多用户模式);还可以进入到系统管理维护(System Management Services,简写SMS),在其中可以修改系统引导设备列表。

系统的启动分为硬件初始化阶段和AIX核心初始化阶段,硬件初始化阶段称为ROS IPL(Read Only Storage Initial Program Load),基于MCA(Micro Channel Architecture ,即微通道结构)的经典RS/6000和PCI的RS/6000在硬件初始化阶段存在很大的差别,进行硬件初始化时,机器前面板的LCD或LED上显示着数字代码,这些数字表示初始化系统的进度,当系统初始化出现故障时,用户可以根据这个代码来确定错误的原因。当机器硬件检测完成之后,它就会从引导设备上加载软件,正常情况下会从硬盘引导BOS,完成AIX核心初始化后,启动系统的初始化进程/etc/init,而init依据文件/etc/inittab的内容启动其他系统进程。

要关闭AIX操作系统,必须执行关机命令,关机命令会终止系统中正在执行的进程,甚至使机器自动下电。停止机器运行的命令有shutdown,halt和reboot等,一般情况下使用shutdown命令式比较安全的。

系统引导概述

系统引导过程依赖于所使用的硬件平台。最初的硬件引导阶段,MCA机器和PCI机器存在着很大的差别,这些差别使得硬件引导的方法有所不同。系统引导的模式分为Normal(正常)模式和Service(维护)模式,因而,也存在两种不同的引导设备列表,即正常模式引导列表和维护模式引导列表。由于MCA系统和PCI系统的差别,选择这两种引导模式及其列表的方法也不同。

完成硬件引导之后,就进入软件引导阶段。在软件引导阶段,MCA机器和PCI机器的操作系统引导过程完全一样。无论是硬件问题,还是软件问题都能终止系统的引导,引导问题是比较常见的问题,熟悉系统的引导过程对解决引导问题是至关重要的。

一般的引导过程

ssf

一般的引导过程,如图所示。

在机器加电启动时,首先检查机器的硬件,确定主要的硬件是否能够正常工作。对于经典的RS/6000机器(MCA结构的系统),这个阶段分为两个独立的阶段:内置设备自检(Build-In Self Test 简写BIST)和加电自检(Power-On Self 简写为POST)。对于PCI机器,由单独的加电自检(POST)完成硬件检查。

经过硬件检查阶段后,系统尝试加在引导逻辑卷(Boot Logical Volume,简称为BLV),并在内存中形成RAM文件系统,同时把控制权交给BLV。

由于在内存中没有加载任何LVM设备驱动程序,因此系统必须在不使用LVM的情况下查找正确的BLV位置。正确的BLV地址信息(包括偏移量和长度)保存在引导磁盘的第一个扇区(512字节的块)上,这个扇区称为引导记录(Boot Record)。

引导逻辑卷(BLV),也称为引导镜像,它包括下面几个部分:

l  软件ROS(Soft Ros) 软件只读存储器(Read Only Storage ,简写为ROS)只适用于PCI的机器上,在PCI的机器上,软件ROS执行一些系统初始化工作。PCI机器的ROS不提供软件ROS,而是由AIX提供软件ROS。

l  AIX内核 AIX内核总是从BLV加载。/unix(软连接到/usr/lib/boot/unix_mp或unix_up)是一份内核的副本。这个版本被用于构建hd4文件系统。在系统引导的时候,从hd4文件系统读取内核镜像。

l  Rc.boot rc.boot是一个Shell命令文件,它是配置系统的脚本。在系统引导期间,init进程要执行它三次。

l  简化的ODM 简化的ODM只提供最基本的设备配置信息。

l  引导命令 在引导过程中要执行一些命令,例如cfgmgr和bootinfo等。

当加载BLV之后,由于rootvg没有被激活,并不可用,引导所需的信息都包含在用于在内存中创建 RAM文件系统的BLV中。之后,init进程被加载运行,并由init进程开始配置基本设备,这是第一引导阶段(init 在执行rc.boot时带着一个参数1)。

下一步被称为第二引导阶段,主要目的是激活rootvg。这是最常出现问题的阶段,例如一个文件系统或jfslog被破坏。接着控制权交给rootvg中的init进程,然后释放RAM文件系统。

最后,init进程(从磁盘加载的进程,而不是BLV中的init进程)带参数3执行rc.boot脚本配置剩余的设备从/etc/inittab启动其他进程。

硬件检测过程(BIST和POST)

前面已提到,在硬件引导过程中,MAC体系结构的经典RS/6000和当今流行的PCI体系结构RS/6000存在着比较大的差异。下面介绍这些差异。

  1. 经典RS/6000的引导过程

下图是经典RS/6000的启动流程图。经典RS/6000机型(MCA结构的系统)在启动时,当系统加电启动后,就先进行到内置谁被自检(Build_In Self Test简写为BIST),这些测试位于EPROM芯片中。BIST只检测主板上一些最基本的系统组件,如处理器,内存和系统背板(System Planar)等,在这个测试阶段,在前面板的LED上根据硬件设备的状态显示100到195之间的代码。

经典RS/6000启动流程

经典RS/6000启动流程

当BIST阶段完成后,然后再进行加电自检(Power-On Self Test简写为POST)阶段。POST将检查一些其他系统设备和I/O设备。POST完成之后,就开始查找一个可以引导系统的设备(即引导逻辑卷,BLV),然后把引导程序和内核程序从引导设备调入到内存。所有需要加载阴道镜像的硬件都会被检测到,在这个阶段,LED的代码是200到2E7,硬件问题和软件问题都可能中断系统的启动。当所有硬件通过检测之后,接着就是软件系统的启动。

在MCA系统中,加载BLV时就检查引导设备列表,引导设备列表由切换钥匙的位置来确定。

  1. PCI的RS/6000的引导过程

当PCI的RS/6000引导时,与MCA的RS/6000存在很大的差别,前面已提到的差别是基于PCI的RS/6000机型,而没有BIST阶段,只有POST阶段,实际上时将BIST和POST合为一个阶段。另一个差别是PCI的RS/6000没有钥匙切换(切换启动模式的钥匙,MCA机器的前面板上这个钥匙),现代的PCI机器上用一种逻辑钥匙模式切换,即通过按某些功能键来处理这个启动模式的切换。

当PCI系统进行POST时,就检测基本的硬件设备,如处理器,内存,系统背板(System Planar)和I/O设备等。当POST结束时,如果从使用图形控制台启动系统,则在屏幕上依次显示内存,键盘,网络,SCSI适配器和扬声器等5个图标;如果从使用字符控制台启动系统,则在屏幕上依次显示Memory,keyboard,Network,SCSI和Speaker等5个文字单词,表示在POST阶段对这几个硬件部分分别进行了检测。系统检测完成后,就要加载操作系统的引导程序,寻找引导逻辑卷(BLV),即可引导的镜像程序,然后将可引导的镜像程序调入内存。当屏幕上显示”Starting Software…”字样时,表示现在正在启动软件系统,也就是启动AIX系统,到这里整个硬件的引导将完成,接着就是软件系统的启动。

基于PCI的RS/6000启动流程

基于PCI的RS/6000启动流程

早期的PCI机器,例如7020-40P和7248-43P,在引导系统硬件时没有LED显示,因此无法通过LED码来解决引导中的问题。幸运的是,现代的PCI机器已改变了这种情况,但是不同的机型在系统启动阶段产生的错误代码也不相同。因此,要精确地判断错误代码的含义只能参考随机器所带的维护指南或手册,或者通过IBM的网站查找相应机型的引导错误代码的含义。

系统的引导模式

大体上讲,AIX操作系统的启动模式有两种,分别是Normal(正常)模式和Service(维护)模式,有时候将Normal(正常)模式成为多用户模式,将Service(维护)模式成为单用户模式。

  1. 经典RS/6000系统的引导模式

经典的RS/6000(MCA系统)前面板上有一个钥匙状态,它有三个位置,3个状态分别是:

l  Normal(正常)

l  Secure(安全)

l  Service(维护)

钥匙只有处在Normal位置,才能从硬盘中引导AIX操作系统到多用户;当钥匙处于Secure时,系统不会启动,这是系统的安全状态,这样可以锁定系统,不允许任何人来启动系统;而在Service位置,这是系统的维护状态,系统会从磁带,CD-ROM和硬盘等设备,比如如果安装操作系统一定要从磁带或CD-ROM引导,则钥匙必须处于Service模式。

将钥匙处于Normal模式,从硬盘引导BOS,用户程序和系统程序能够正常地运行,而且所有可以登录的终端都被激活,所有的文件系统也可以被访问,网络正常启动并且能够通信,系统处于多用户方式下(即运行级别是2).如果要正常模式引导系统,就将MCA机器的钥匙状态扳到Normal位置,系统就寻找正常模式的而引导设备列表(Boot List),根据该列表上探测出一个可以引导的设备,即找到一个包含引导逻辑卷的设备。

当钥匙处于Service位置时,从磁带或CD-ROM引导BOS,用户程序一般不能运行,只能从系统控制台可以登录,能够执行的命令不多,网络没有启动,所以也无法通信。如果要以维护模式引导系统,就将MCA机器的钥匙状态扳倒Service位置,系统就寻找维护模式的引导设备列表,根据该列表来探测出一个包含引导逻辑卷的设备。

更改引导列表中设备的顺序,可以在AIX命令行下用bootlist命令修改,也可以在diag菜单中修改。

  1. PCI的RS/6000系统的引导模式

基于PCI的RS/6000没有钥匙状态,只有两种引导模式,分别是Normal(正常)模式和Service(维护)模式。多用户模式一般从内置硬盘启动系统,单用户模式可以从磁带,CD-ROM或者硬盘启动。正常模式(多用户模式)引导AIX系统时,是哟有的用户程序和系统进程能够正常运行,而且所有激活的终端可以注册登录,所有的文件也能被访问,网络正常启动并且能够通信,系统处于多用户方式下。

以维护模式引导系统时,就会进入到系统诊断状态,用户可以查看和更改系统的设备和配置,可以进行系统的诊断及维护工作。系统管理员一般进入到这种执行系统维护和管理任务,此时的系统中只有一个root用户,因此被称为单用户模式。PCI系统的维护模式分为两种情况,一种是独立诊断模式(从磁带或CD-ROM引导到维护模式),另一种是在线诊断模式(从硬盘引导到维护模式),这两种模式在今后会提到。

Rman 备份检验功能

RMAN 中 “backup validate database”  语法可以用来扫描数据库中的物理错误,实际验证期间并不产生备份集。

如果需要更多的错误检查,可以使用backup 命令的check logical 选项来配置备份执行逻辑讹误检查,示例如下:

backup validate check logical database;

示例中RMAN 仅执行逻辑数据库验证操作,而不产生实际备份集。

需要注意如果要在给定的错误数内仍然继续执行备份,需设置maxcorrupt 参数值。如下:

run {

set maxcorrupt for datafile 1,2,3,4 to 10;

backup validate check logical database;

}

沪ICP备14014813号-2

沪公网安备 31010802001379号