Advanced Diagnostic using oradebug dumpvar

oradebug工具是Oracle数据库调优和诊断的利器,合理运用oradebug可以大幅减少我们收集诊断信息所花费的时间。当然前提是合理运用,对于初级DBA有这样一个忠告,不要在生产环境中去接触或修改自己所不熟悉的领域的东西,这一点很重要。并不是说中高级DBA在知识和经验上能达到巨细靡遗的状态,实际上中高级Oracle DBA可能每天都在和新事物打交道。恰恰相反,中高级DBA是对以上忠告最忠实的践行者,在生产环境中绝不随意涉险于不确定、不明确、不熟悉的区域,很多时候这会让人觉得是一种”好奇心丧失”的表现。

言归正传,我们所要介绍的是oradebug的dumpvar命令,该命令用于打印转储固定的PGA/UGA/SGA的变量:,其语法如下:

oradebug dumpvar
Print/dump a fixed PGA/SGA/UGA variable.
Syntax                                                              Parameter
oradebug dumpvar <p|s|uga> <variable name> [level]                  <p|s|uga> PGA,SGA or UGA
                                                                    fixed variable name
                                                                    [level]
使用示例
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn


SQL> show parameter db_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_files                             integer     200

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug dumpvar sga kcfdpk
kfil kcfdpk_ [698B950, 698B954) = 000000C8

这里的KCFDPK变量保存了db_files参数的值,000000C8 == 200

利用oradebug工具我们不仅可以做到对这些内部变量的窥视,还可以做到修改,
但是这对我们实际的诊断没有益处,仅仅可以用来满足某些模拟实验

仅仅了解dumpvar命令的使用方法并没有意义,只有和有意义的内部变量结合起来才能真正起到高级诊断的作用,在这里抛砖引玉罗列出部分诊断变量:

sgauso --  该变量可以用于判断use_stored_outlines

SQL> oradebug setmypid;
Statement processed.

Default use_stored_outlines=false;

SQL> oradebug dumpvar sga sgauso
qolprm sgauso_ [060021418, 06002143C) =
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

SQL> alter system set use_stored_outlines=true;
System altered.

SQL> oradebug dumpvar sga sgauso
qolprm sgauso_ [060021418, 06002143C) =
00000001 45440007 4C554146 00000054 00000000 00000000 00000000 00000000 00000000

45440007 4C554146 ==>      DEFAULT,意为优化器使用DEFAULT category中存放的outline.


ksmvpa -- the size of the variable part of the pga

SQL> oradebug dumpvar pga ksmvpa
b4 ksmvpa_ [0068AA6B4, 0068AA6B8) = 0000E920          -- 59680 bytes

kkjsre        -- The SGA variable kkjsre must be 1 for jobs to execute automatically.

SQL> oradebug  dumpvar sga kkjsre
word kkjsre_ [060025760, 060025764) = 00000001

SQL> exec  dbms_ijob.set_enabled(false);
PL/SQL procedure successfully completed.

SQL> oradebug  dumpvar sga kkjsre
word kkjsre_ [060025760, 060025764) = 00000000

kcmsmx          --the MAX reasonable scn 

SQL> oradebug dumpvar pga kcmsmx
kscn kcmsmx_ [00B7E811C, 00B7E8124) = CA7F0000 00000C6C         -- 11.2.0.2

SQL> oradebug dumpvar pga kcmsmx
kscn kcmsmx_ [0068AB02C, 0068AB034) = A701C000 00000B3D         -- 10.2.0.4 

Notice the MAX scn allowed on 11.2 is far higher that that at earlier releases
(due to the fix from bug 9254170).
Use the fix for 9254170 on ALL DBs with an SCN rate set to match between versions.
Avoid excessive SCN generation rates.

kcvlcm - logging checkpoints to alert - FALSE

SQL> show parameter log_checkpoints_to_alert

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_checkpoints_to_alert             boolean     TRUE

SQL> oradebug dumpvar sga kcvlcm
word kcvlcm_ [060019E88, 060019E8C) = 00000001

SQL> alter system set log_checkpoints_to_alert=false;

System altered.

SQL> oradebug dumpvar sga kcvlcm
word kcvlcm_ [060019E88, 060019E8C) = 00000000

kcvcpr - false (a checkpoint is requested)

SQL> oradebug dumpvar sga  kcvcpr
word kcvcpr_ [060019E90, 060019E94) = 00000000

kcvcpf - false (checkpointing fast)

kcvgcw -false a global checkpointing is waiting

SQL> oradebug dumpvar sga kcvgcw
sword kcvgcw_ [060025748, 06002574C) = 00000000

kcfcpd - true if checkpoint writes done

kcvblg  kcvblg[0] is s 1 incidate some file/s in backup mode

SQL> oradebug dumpvar sga kcvblg
ub4 * kcvblg_ [060019E18, 060019E20) = 9A248508 00000000

SQL> oradebug peek 0x9A248508 100
[09A248508, 09A24856C) =
00000001 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000002 00000000
00000002 00000002 00000000 00000000 ...

alter database begin backup;

SQL>  oradebug peek 0x9A248508 100
[09A248508, 09A24856C) =
00000001 00000003 00000003 00000003 00000003
00000003 00000003 00000003 00000003 00000003
00000003 00000003 00000003 00000003 ...

kcsgscn_           -- current scn 

SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [0600122B0, 0600122E0) =
01C7DB4B 00000000 00000000 00000000              01C7DB4B  -- 29875019
0001E907 00000000 00000000 00000000
00000000 00000000 60011F90 00000000              60011F90 is fixed

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
   29875044

kslwlst_  --  waiter list latch

SQL> oradebug dumpvar sga kslwlst
ksllt kslwlst_ [200040AC, 20004174) =
00000000 00000009 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 ...

kcfdfk -- 2 * db_files

SQL> oradebug dumpvar sga kcfdfk
kfil kcfdfk_ [060017EF0, 060017EF4) = 00000190            --400    when db_files=200 

SQL> alter system set db_files=2000 scope=spfile;
System altered.

SQL> oradebug dumpvar sga kcfdfk
kfil kcfdfk_ [060017EF0, 060017EF4) = 00000FA0             --4000

kcffsz  Address of Datafile Size 

So kcbzib() passes in information about how many blocks to read
( and where to start reading from ) The assertion is 

  if (bno < 2 || bno + nblks - 1 > (kcflsz[fno] ? kcflsz[fno] : kcffsz[fno]))

              ==============================================================

              .......NOTE: This section is removed after 9g..................

  {
     /* if the user gave us the block number then this is an external error,
      * but if the dba was internally generated then this is a corruption of
      * some kind. */
     ASSERTNM5(usrdba, OERINM("kcfrbd_2"),
               fno, bno, nblks, kcflsz[fno], kcffsz[fno]);

SQL> oradebug dumpvar sga kcffsz
ub4 * kcffsz_ [060017F20, 060017F28) = 9988E700 00000000

SQL> oradebug peek 0x9988E700 200
[09988E700, 09988E7C8) =
00000001
0007B200
00030980
0000AF00
00067CA0
00003200
00280000
00000500
00000000
00000A00
00000000
00000000
00000600
00040000 ...

SQL> select blocks,to_char(blocks,'XXXXXXXX') hex_blocks from v$datafile;

    BLOCKS HEX_BLOCK
---------- ---------
    504320     7B200
    199040     30980
     44800      AF00
    425120     67CA0
     12800      3200
   2621440    280000
      1280       500
         0         0
      2560       A00
     38400      9600
      6400      1900

    BLOCKS HEX_BLOCK
---------- ---------
      1536       600
    262144     40000

13 rows selected.

kcflsz -- like kcffsz

SQL> oradebug dumpvar  sga kcflsz
ub4 * kcflsz_ [060017F28, 060017F30) = 99892588 00000000

SQL> oradebug peek 0x99892588 200
[099892588, 099892650) =
00000000 0007B200 00030980 0000AF00 00067CA0
00003200 00280000 00000500 00000000 00000A00
00000000 00000000 00000600 00040000 ...

THe arguments are file number, block number, number of blocks, kcflsz[fno],
kcffsz[fno]
Error comes out of kcf.c.
Can they try a validate structure cascade on this table and see what happens?
Please update the customer field so that it does not read internal.

kcf.c kcf.c Kernel Cache Files component.
ksl.c Kernel Service layer Latching & Wait-post Implement.
ksm.c Kernel Service Memory component implementation.
kkj.c Kernel Kompiletime Job queue.

还原真实的cache recovery

我们在学习Oracle基础知识的时候会了解到实例恢复(Instance Recovery)或者说崩溃恢复(Crash recovery)的概念,有时候甚至于这2个名词在我们日常的语言中表达同样的意思。实际上Instance Recovery与Crash Recovery是存在区别的:针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复,我们称之为Crash Recovery。而对于RAC中的某一个节点失败,存活节点(surviving instance)试图对失败节点线程上redo做应用的情况,我们称之为Instance Recovery。

不管是Instance Recovery还是Crash Recovery,都由2个部分组成:cache recovery继以transaction recovery。

根据官方文档的介绍,Cache Recovery也叫Rolling Forward,就是我们常说的前滚;而Transaction Recovery也叫Rolling Back,就是我们常说的回滚。前滚和回滚贯穿Oracle恢复的基本概念,是我们入门必要学习的知识,在次不多介绍。

有文事者,必济之以武略。理论学得再好,不实践也无用。所幸Crash Recovery是很容易做成的实验,我们不妨来看一看:

SQL> shutdown abort;
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

SQL> alter database open;

Crash Recovery将从alter database open开始,我们来观察其日志

==================alert.log====================
alter database open
Tue Jun 14 18:19:53 2011
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Tue Jun 14 18:19:53 2011
Started redo scan
Tue Jun 14 18:19:53 2011
Completed redo scan
 0 redo blocks read, 0 data blocks need recovery
Tue Jun 14 18:19:53 2011
Started redo application at
 Thread 1: logseq 1004, block 1124, scn 17136185
Tue Jun 14 18:19:53 2011
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1004 Reading mem 0
  Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
  Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Tue Jun 14 18:19:53 2011
Completed redo application
Tue Jun 14 18:19:53 2011
Completed crash recovery at
 Thread 1: logseq 1004, block 1124, scn 17156186
 0 data blocks read, 0 data blocks written, 0 redo blocks read
Tue Jun 14 18:19:53 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=16, OS id=7829
Tue Jun 14 18:19:53 2011
Thread 1 advanced to log sequence 1005 (thread open)
Thread 1 opened at log sequence 1005
  Current log# 3 seq# 1005 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
  Current log# 3 seq# 1005 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:19:53 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:19:53 2011
SMON: enabling cache recovery
Tue Jun 14 18:19:53 2011
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 18:19:54 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:19:54 2011
SMON: enabling tx recovery
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=17, OS id=7831
Tue Jun 14 18:19:55 2011
Completed: alter database open

注意上述单实例Crash Recovery到数据库打开的整个过程:

  • alter database open
  • Beginning crash recovery of 1 threads
  • Started redo scan
  • Completed redo scan
  • Started redo application at
  • Completed redo application
  • Completed crash recovery at
  • SMON: enabling cache recovery
  • Successfully onlined Undo Tablespace 1
  • SMON: enabling tx recovery
  • Completed: alter database open

从上述步骤中我们可以看到三种恢复名词,即:

  • crash recovery
  • cache recovery
  • tx recovery

这和官方文档所描述的Crash Recovery概念是不一致的,我们现在来理清这几种recovery。

crash recovery包含对redo的scan和application,显然其完成的是Rolling Forward前滚的工作,告警日志中出现的crash recovery等同于官方文档中介绍的”cache recovery”,我们可以将” Completed crash recovery”看做前滚完成的标志。而tx recovery从字面就可以看出实际上是Transaction Recovery,tx recovery发生在Undo Tablespace online之后(回滚事务的前提是Undo可用),数据完成打开操作之前(“Completed: alter database open”)。注意tx recovery并不要求数据库打开前完成,仅仅是在数据库打开之前由smon启动(“SMON: enabling tx recovery”)。

剩下的唯一的问题是,这里的cache recovery是什么?显然它不是官方文档中所描述的”cache recovery”,几乎没有任何文档介绍存在这样一个recovery操作,这也是本文重点要介绍的。

我们来看另一个演示,这个演示用以说明cache recovery还存在于最普通的不包含Crash Recovery的数据库打开过程中:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

SQL> alter database open;
Database altered.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

==================alert.log====================
alter database open
Tue Jun 14 18:43:52 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=14, OS id=8133
Tue Jun 14 18:43:52 2011
Thread 1 opened at log sequence 1005
Current log# 3 seq# 1005 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
Current log# 3 seq# 1005 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:43:52 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:43:52 2011
SMON: enabling cache recovery
Tue Jun 14 18:43:53 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:43:53 2011
SMON: enabling tx recovery
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication found)
Tue Jun 14 18:43:53 2011
Incremental checkpoint up to RBA [0x3ed.624.0], current log tail at RBA [0x3ed.944.0]
Tue Jun 14 18:43:53 2011
Starting background process QMNC
QMNC started with pid=15, OS id=8135
Tue Jun 14 18:43:53 2011
Completed: alter database open

因为是clean shutdown,所以这里不存在crash recovery。但这里同样出现了”SMON: enabling cache recovery”,可见cache recovery是每次实例启动instance startup必要执行的一种恢复操作。但问题是,这个恢复操作到底针对何种对象?

实际上cache recovery所要恢复的是rowcache,也就是我们常说的字典缓存(dictionary cache)。关于这个结论,肯定有很多人要问我这样说的依据是什么,对应于这个”cache recovery”的问题,我们很难从google中得到一些启示,因为它和官方文档所描述的”cache recovery-rolling forward”存在重名的关系。

为了证明cache recovery所恢复的是rowcache,我们需要一个实证,从正式的系统中得到验证。要做到这一点是比较困难的,我们需要Oracle愿意把整个database open的过程变成慢动作来供我们参考,验证要用到一些调试工具,例如gdb或者dbx。

我们首先将实例启动到mount状态,并对执行startup的LOCAL进程做gdb的breakpoint断点调试:

SQL> shutdown abort;
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

找出LOCAL进程的系统进程号SPID

SQL> select spid from v$process
2  where addr in (
3  select paddr from v$session
4  where sid=(select distinct sid from v$mystat))
5  /

SPID
------------
8326

在实例startup nomount/mount后共享池的library cache就是可用的

SQL> select namespace from v$librarycache where gets!=0;

NAMESPACE
---------------
SQL AREA
TABLE/PROCEDURE

而rowcache则尚未被填充,因为字典缓存来源于自举对象(bootstrap$)和字典基表

SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected

另开一个terminal窗口,并执行对LOCAL进程8326的gdb breakpoint调试

[oracle@rh2 ~]$ gdb $ORACLE_HOME/bin/oracle 8326
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /s01/db_1/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/db_1/bin/oracle, process 8326
Reading symbols from /s01/db_1/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libskgxp10.so
Reading symbols from /s01/db_1/lib/libhasgen10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libhasgen10.so
Reading symbols from /s01/db_1/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libskgxn2.so
Reading symbols from /s01/db_1/lib/libocr10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocr10.so
Reading symbols from /s01/db_1/lib/libocrb10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocrb10.so
Reading symbols from /s01/db_1/lib/libocrutl10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocrutl10.so
Reading symbols from /s01/db_1/lib/libjox10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libjox10.so
Reading symbols from /s01/db_1/lib/libclsra10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libclsra10.so
Reading symbols from /s01/db_1/lib/libdbcfg10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libdbcfg10.so
Reading symbols from /s01/db_1/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x0000003181a0d8e0 in __read_nocancel () from /lib64/libpthread.so.0

输入断点kcrf_commit_force和kqlobjlod
(gdb) break kcrf_commit_force
Breakpoint 1 at 0x2724b6c

(gdb) break kqlobjlod
Breakpoint 2 at 0x1ac5e8c

在之前的terminal中执行数据库打开操作,因为breakpoint的关系这个open操作会hang住,
这时我们通过观察告警日志来了解恢复进度

SQL> alter database open;                              --这里会hang住

在gdb下输入continue,

(gdb) c
Continuing.

Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()

观察告警日志可以发现redo application已经完成,但还未进入cache recovery阶段

alter database open
Tue Jun 14 19:14:33 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Tue Jun 14 19:14:33 2011
Started redo scan
Tue Jun 14 19:14:33 2011
Completed redo scan
39 redo blocks read, 74 data blocks need recovery
Tue Jun 14 19:14:33 2011
Started redo application at
Thread 1: logseq 1006, block 1155
Tue Jun 14 19:14:33 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1006 Reading mem 0
Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_1_6v34jnkn_.log
Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_1_6v34jnst_.log
Tue Jun 14 19:14:33 2011
Completed redo application
Tue Jun 14 19:14:33 2011
Completed crash recovery at
Thread 1: logseq 1006, block 1194, scn 17200193
74 data blocks read, 74 data blocks written, 39 redo blocks read
Tue Jun 14 19:14:33 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=17, OS id=8656
Tue Jun 14 19:14:33 2011
Thread 1 advanced to log sequence 1007 (thread open)
Thread 1 opened at log sequence 1007
Current log# 2 seq# 1007 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
Current log# 2 seq# 1007 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Successful open of redo thread 1
Tue Jun 14 19:14:33 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 19:14:37 2011
Incremental checkpoint up to RBA [0x3ef.3.0], current log tail at RBA [0x3ef.3.0]

且此时rowcache仍未被填充

SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected

在gdb界面下再次执行continue 2次

(gdb) c
Continuing.

Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()
(gdb) c
Continuing.

Breakpoint 2, 0x0000000001ac5e8c in kqlobjlod ()

观察告警日志可以发现已开始cache recovery,但也卡陷在cache recovery上,这保证我们的演示不受骚扰

Tue Jun 14 19:16:44 2011
SMON: enabling cache recovery

此时rowcache中出现唯一的一个dc_objects对象

select parameter,count,gets from v$rowcache where count!=0;

PARAMETER                             COUNT       GETS
-------------------------------- ---------- ----------
dc_objects                                1          1

这个对象是什么呢?也许你已经猜到了,我们做一个rowcache dump来看一下:

SQL> ALTER SESSION SET EVENTS 'immediate trace name row_cache level 10';

================row_cache trace===================

BUCKET 43170:
row cache parent object: address=0x92326060 cid=8(dc_objects)
hash=f3d1a8a1 typ=11 transaction=(nil) flags=00000001
own=0x92326130[0x9230f628,0x9230f628] wat=0x92326140[0x92326140,0x92326140] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
set=0, complete=FALSE
data=
00000000 4f42000a 5453544f 24504152 00000000 00000000 00000000 00000000
00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
BUCKET 43170 total object count=1

可以看到上述dc_objects尚未完成加载(status=EMPTY & complete=FALSE ),那么这是一个什么object呢?

4f42000a 5453544f 24504152 => 转换为文本即:OB TSTO$PAR也就是BOOTSTRAP$

换而言之在cache recovery时第一个恢复的字典缓存对象是BOOTSTRAP$,这并不出乎我们的意料。

启动实例的LOCAL进程的等待事件为instance state change,这是常规情况下我们观察不到得

SQL> select event,p1text,p1 from v$session where wait_class!='Idle';

EVENT                                    P1TEXT                                           P1
---------------------------------------- ---------------------------------------- ----------
instance state change                    layer                                             2

在gdb界面下再次continue,将载入更多的rowcache对象

(gdb) c
Continuing.

Breakpoint 2, 0x0000000001ac5e8c in kqlobjlod ()

BUCKET 37:
row cache parent object: address=0x916cd980 cid=3(dc_rollback_segments)
hash=5fed2a24 typ=9 transaction=(nil) flags=000000a6
own=0x916cda50[0x916cda50,0x916cda50] wat=0x916cda60[0x916cda60,0x916cda60] mode=N
status=VALID/INSERT/-/FIXED/-/-/-/-/-
data=
00000000 00000000 00000001 00000009 59530006 4d455453 00000000 00000000
00000000 00000000 00000000 00000000 00000003 00000000 00000000 00000000
00000000 00000000 00000000 00000000
BUCKET 37 total object count=1

595300064d455453 -> SYSTEM 属于dc_rollback_segments 也就是著名的system回滚段

BUCKET 55556:
row cache parent object: address=0x916d8cd0 cid=8(dc_objects)
hash=ce89d903 typ=11 transaction=(nil) flags=00000001
own=0x916d8da0[0x9230f628,0x9230f628] wat=0x916d8db0[0x916d8db0,0x916d8db0] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
set=0, complete=FALSE
data=
00000000 5f430006 234a424f 00000000 00000000 00000000 00000000 00000000
00000000 00000005 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
BUCKET 55556 total object count=1

5f430006 234a424f -> C_OBJ# 是著名的bootstrap对象之一,可以从$ORACLE_HOME/rdbms/admin/sql.bsq中找到它

create rollback segment SYSTEM tablespace SYSTEM
storage (initial 50K next 50K)
/
create cluster c_obj# (obj# number)
pctfree 5 size 800                           /* don't waste too much space */
/* A table of 32 cols, 2 index, 2 col per index requires about 2K.
* A table of 10 cols, 2 index, 2 col per index requires about 750.
*/
storage (initial 130K next 200k maxextents unlimited pctincrease 0)
/* avoid space management during IOR I */
/

我们还可以通过v$rowcache_parent视图来了解dictionary cache的情况

SQL> col cache_name for a20
SQL> col keystr for a31
SQL> set linesize 200
SQL> select address,cache_name,existent,lock_mode,saddr,substr(key,1,30) keystr from v$rowcache_parent;

ADDRESS          CACHE_NAME           E  LOCK_MODE SADDR            KEYSTR
---------------- -------------------- - ---------- ---------------- -------------------------------
00000000916CCE20 dc_tablespaces       N          0 00               000000000000000000000000000000
00000000916CD980 dc_rollback_segments Y          0 00               000000000000000000000000000000
0000000092326060 dc_objects           Y          0 00               000000000A00424F4F545354524150
00000000916D8CD0 dc_objects           N          3 000000009BD91328 000000000600435F4F424A23000000
00000000916DA830 dc_object_ids        Y          0 00               380000000000000000000000000000

可以看到持有row cache lock的会话是'000000009BD91328',
且该dc_objects对象还处于non-existent状态,
换而言之真正装载rowcache的是启动实例的LOCAL服务进程

SQL> select sid,program,event,p1,p2,p3 from v$session where saddr='000000009BD91328';

SID PROGRAM                                          EVENT                                    P1   P2 P3
----- ------------------------------------------------ ---------------------------------------- -- ---- --
3294 sqlplus@rh2.oracle.com (TNS V1-V3)               db file scattered read                    1  378  3

该进程正在等待db file scattered read,fileid->1,block-378,这些块属于BOOTSTRAP$表

BOOTSTRAP$对象已从rowcache被载入到library cache中

SQL> select kglhdadr,kglnaobj from x$kglob where kglobtyp=2 and kglnaobj not like 'X$%';

KGLHDADR             KGLNAOBJ
-------------------- --------------------
0000000092326990     BOOTSTRAP$

SQL> select owner||'.'||Name from v$db_object_cache where type='TABLE' and name not like 'X$%';

OWNER||'.'||NAME
--------------------------------------------------------------------------------
SYS.BOOTSTRAP$

初步总结:

  1. 在数据库正式open前需要恢复字典缓存,这个步骤被称为cache recovery,其实是row cache recovery。与官方文档中描述的”cache recovery”不同,row cache recovery应当是Oracle Internal的叫法。
  2. 实际执行row cache recovery的不是SMON进程,而是启动实例的服务进程

Applying online patch on 11gr2

在Oracle 11g中提出了online patch(也叫hot patch)的特性;Hot patching允许我们在实例始终在线的情况下安装,启用或禁用一个修复补丁或者诊断补丁。针对7*24在线的业务系统,hot patch为我们提供了一条既能避免当机时间而又可以实施补丁的途径。在Oracle 11g中我们可以使用Opatch命令行工具针对任意数据库实施在线补丁(前提是该补丁是一个hot patch)。一般来说在线补丁(hot patches)只能是那些代码修改范围小且复杂度很低的补丁,举例来说它们往往是一些诊断补丁(diagnostic patches)或者小bug的修复(small bug fixes)。值得注意的是hot patching将需要消耗额外的内存,决定其消耗内存数量的因素是:1.补丁本身的大小,2.实例中的进程总数;举例来说某个补丁的大小正好为一个OS page的大小(一般为4kB),那么当实例中运行的进程总数为1000时,则该hot patching所额外消耗的内存总数为4kB*1000=4MB。

hot patches与常规Conventional patches对比具有可在线实施和安装快的特性,如下图:
online patching

在实际生产环境中,相信没有多少朋友实施过hot patching,一来国内目前还没有普及11g的使用,二来hot patching的数量在所有interim patch中只占极少数;一直以来都想写这样一篇关于hot patching的博文,唯一妨碍我写作的问题是在11.2.0.1下找不到可实施的online interim patch;以MOS->patches&upgrade目前的分类我们很难找出某个base release下可用的hot patch,当然这并不妨碍补丁专栏的使用。为了这个令人郁闷的问题,我特意去提交了一个Service Request,得到的回复:

I have tried to find the patches which support online patching on 11.2.0.1 version,
but I also can not find them because there are too many patches and there is no catalog for the patches
which support online patching, and I can only check the patch readme to confirm whether that patch supports online patching.

I found one patch which supports online patching, but this patch is for 11.2.0.2 version.
The patch no. is 10188727.

Sorry for the inconvenience brought to you. Hope the above update can help you.
If the above patch is not what you want, then please update the SR and I will continue for your issue.

这其中提到的patch 10188727,可以从Note<RDBMS Online Patching Aka Hot Patching [ID 761111.1]>中找到,另外一个可找到的hot patch是11.1.0.6上的6198642<DUMMY PATCH FOR TESTING DB11 PATCHING>,不过很可惜该补丁只有Linux x86一个平台版本的。所以我不得不先将11.2.0.1的测试库升级到了11.2.0.2上:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

[maclean@rh2 OPatch]$ ps -ef|grep pmon|grep -v grep
maclean  22481     1  0 19:19 ?        00:00:00 ora_pmon_PROD

[maclean@rh2 OPatch]$ pmap -d 22481
22481:   ora_pmon_PROD
Address           Kbytes Mode  Offset           Device    Mapping
0000000000400000  180232 r-x-- 0000000000000000 008:00002 oracle
000000000b602000    1820 rwx-- 000000000b002000 008:00002 oracle
000000000b7c9000     300 rwx-- 000000000b7c9000 000:00000   [ anon ]
000000000dbef000     436 rwx-- 000000000dbef000 000:00000   [ anon ]
0000000060000000 2050048 rwxs- 0000000000000000 000:00009   [ shmid=0x550001 ]
0000003e09a00000     112 r-x-- 0000000000000000 008:00001 ld-2.5.so
0000003e09c1b000       4 r-x-- 000000000001b000 008:00001 ld-2.5.so
0000003e09c1c000       4 rwx-- 000000000001c000 008:00001 ld-2.5.so
0000003e09e00000    1336 r-x-- 0000000000000000 008:00001 libc-2.5.so
0000003e09f4e000    2044 ----- 000000000014e000 008:00001 libc-2.5.so
0000003e0a14d000      16 r-x-- 000000000014d000 008:00001 libc-2.5.so
0000003e0a151000       4 rwx-- 0000000000151000 008:00001 libc-2.5.so
0000003e0a152000      20 rwx-- 0000003e0a152000 000:00000   [ anon ]
0000003e0a200000     520 r-x-- 0000000000000000 008:00001 libm-2.5.so
0000003e0a282000    2044 ----- 0000000000082000 008:00001 libm-2.5.so
0000003e0a481000       4 r-x-- 0000000000081000 008:00001 libm-2.5.so
0000003e0a482000       4 rwx-- 0000000000082000 008:00001 libm-2.5.so
0000003e0a600000       8 r-x-- 0000000000000000 008:00001 libdl-2.5.so
0000003e0a602000    2048 ----- 0000000000002000 008:00001 libdl-2.5.so
0000003e0a802000       4 r-x-- 0000000000002000 008:00001 libdl-2.5.so
0000003e0a803000       4 rwx-- 0000000000003000 008:00001 libdl-2.5.so
0000003e0aa00000      88 r-x-- 0000000000000000 008:00001 libpthread-2.5.so
0000003e0aa16000    2044 ----- 0000000000016000 008:00001 libpthread-2.5.so
0000003e0ac15000       4 r-x-- 0000000000015000 008:00001 libpthread-2.5.so
0000003e0ac16000       4 rwx-- 0000000000016000 008:00001 libpthread-2.5.so
0000003e0ac17000      16 rwx-- 0000003e0ac17000 000:00000   [ anon ]
0000003e0ae00000      28 r-x-- 0000000000000000 008:00001 librt-2.5.so
0000003e0ae07000    2048 ----- 0000000000007000 008:00001 librt-2.5.so
0000003e0b007000       4 r-x-- 0000000000007000 008:00001 librt-2.5.so
0000003e0b008000       4 rwx-- 0000000000008000 008:00001 librt-2.5.so
0000003e0da00000      84 r-x-- 0000000000000000 008:00001 libnsl-2.5.so
0000003e0da15000    2044 ----- 0000000000015000 008:00001 libnsl-2.5.so
0000003e0dc14000       4 r-x-- 0000000000014000 008:00001 libnsl-2.5.so
0000003e0dc15000       4 rwx-- 0000000000015000 008:00001 libnsl-2.5.so
0000003e0dc16000       8 rwx-- 0000003e0dc16000 000:00000   [ anon ]
00002abec920e000       8 rwx-- 00002abec920e000 000:00000   [ anon ]
00002abec9210000       4 r-x-- 0000000000000000 008:00002 libodmd11.so
00002abec9211000    1024 ----- 0000000000001000 008:00002 libodmd11.so
00002abec9311000       4 rwx-- 0000000000001000 008:00002 libodmd11.so
00002abec9312000     360 r-x-- 0000000000000000 008:00002 libcell11.so
00002abec936c000    1020 ----- 000000000005a000 008:00002 libcell11.so
00002abec946b000      36 rwx-- 0000000000059000 008:00002 libcell11.so
00002abec9474000       4 rwx-- 00002abec9474000 000:00000   [ anon ]
00002abec9475000     848 r-x-- 0000000000000000 008:00002 libskgxp11.so
00002abec9549000    1024 ----- 00000000000d4000 008:00002 libskgxp11.so
00002abec9649000       8 rwx-- 00000000000d4000 008:00002 libskgxp11.so
00002abec9665000       4 rwx-- 00002abec9665000 000:00000   [ anon ]
00002abec9666000    2580 r-x-- 0000000000000000 008:00002 libnnz11.so
00002abec98eb000    1020 ----- 0000000000285000 008:00002 libnnz11.so
00002abec99ea000     264 rwx-- 0000000000284000 008:00002 libnnz11.so
00002abec9a2c000       8 rwx-- 00002abec9a2c000 000:00000   [ anon ]
00002abec9a2e000      96 r-x-- 0000000000000000 008:00002 libclsra11.so
00002abec9a46000    1020 ----- 0000000000018000 008:00002 libclsra11.so
00002abec9b45000       4 rwx-- 0000000000017000 008:00002 libclsra11.so
00002abec9b46000       4 rwx-- 00002abec9b46000 000:00000   [ anon ]
00002abec9b47000     136 r-x-- 0000000000000000 008:00002 libdbcfg11.so
00002abec9b69000    1020 ----- 0000000000022000 008:00002 libdbcfg11.so
00002abec9c68000       8 rwx-- 0000000000021000 008:00002 libdbcfg11.so
00002abec9c6a000    6832 r-x-- 0000000000000000 008:00002 libhasgen11.so
00002abeca316000    1020 ----- 00000000006ac000 008:00002 libhasgen11.so
00002abeca415000     136 rwx-- 00000000006ab000 008:00002 libhasgen11.so
00002abeca437000      24 rwx-- 00002abeca437000 000:00000   [ anon ]
00002abeca43d000       8 r-x-- 0000000000000000 008:00002 libskgxn2.so
00002abeca43f000    1020 ----- 0000000000002000 008:00002 libskgxn2.so
00002abeca53e000       4 rwx-- 0000000000001000 008:00002 libskgxn2.so
00002abeca53f000       4 rwx-- 00002abeca53f000 000:00000   [ anon ]
00002abeca540000     656 r-x-- 0000000000000000 008:00002 libocr11.so
00002abeca5e4000    1020 ----- 00000000000a4000 008:00002 libocr11.so
00002abeca6e3000      12 rwx-- 00000000000a3000 008:00002 libocr11.so
00002abeca6e6000     628 r-x-- 0000000000000000 008:00002 libocrb11.so
00002abeca783000    1024 ----- 000000000009d000 008:00002 libocrb11.so
00002abeca883000       8 rwx-- 000000000009d000 008:00002 libocrb11.so
00002abeca885000      44 r-x-- 0000000000000000 008:00002 libocrutl11.so
00002abeca890000    1020 ----- 000000000000b000 008:00002 libocrutl11.so
00002abeca98f000       4 rwx-- 000000000000a000 008:00002 libocrutl11.so
00002abeca990000       4 rwx-- 00002abeca990000 000:00000   [ anon ]
00002abeca991000       4 r-x-- 0000000000000000 008:00001 libaio.so.1.0.1
00002abeca992000    2044 ----- 0000000000001000 008:00001 libaio.so.1.0.1
00002abecab91000       4 rwx-- 0000000000000000 008:00001 libaio.so.1.0.1
00002abecab92000      16 rwx-- 00002abecab92000 000:00000   [ anon ]
00002abecab96000      20 r-x-- 0000000000000000 008:00001 libnuma.so.1
00002abecab9b000    2044 ----- 0000000000005000 008:00001 libnuma.so.1
00002abecad9a000       4 rwx-- 0000000000004000 008:00001 libnuma.so.1
00002abecad9b000    1280 rwx-- 00002abecad9b000 000:00000   [ anon ]
00002abecaef5000      40 r-x-- 0000000000000000 008:00001 libnss_files-2.5.so
00002abecaeff000    2044 ----- 000000000000a000 008:00001 libnss_files-2.5.so
00002abecb0fe000       4 r-x-- 0000000000009000 008:00001 libnss_files-2.5.so
00002abecb0ff000       4 rwx-- 000000000000a000 008:00001 libnss_files-2.5.so
00002abecb100000    1700 rwx-- 00002abecb100000 000:00000   [ anon ]
00002abecb2a9000      28 rwx-- 0000000000000000 000:00011 zero
00002abecb2b0000      64 rwx-- 0000000000000000 000:00011 zero
00002abecb2c0000      64 rwx-- 0000000000000000 000:00011 zero
00002abecb2d0000      64 rwx-- 0000000000000000 000:00011 zero
00002abecb2e0000      64 rwx-- 0000000000000000 000:00011 zero
00002abecb2f0000      64 rwx-- 0000000000000000 000:00011 zero
00002abecb300000     164 rwx-- 0000000000057000 000:00011 zero
00002abecb329000       8 rwx-- 00002abecb329000 000:00000   [ anon ]
00002abecb32b000       4 rwxs- 0000000000000000 008:00002 hc_PROD.dat
00002abecb32c000      40 r-x-- 0000000000000000 008:00002 libnque11.so
00002abecb336000    1020 ----- 000000000000a000 008:00002 libnque11.so
00002abecb435000       4 rwx-- 0000000000009000 008:00002 libnque11.so
00002abecb436000    1048 rwx-- 00002abecb436000 000:00000   [ anon ]
00007fff3342d000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 2291488K    writeable/private: 7840K    shared: 2050052K

[maclean@rh2 ~]$  cd $ORACLE_HOME/OPatch
[maclean@rh2 OPatch]$ unzip p10188727_112020_Linux-x86-64.zip
Archive:  p10188727_112020_Linux-x86-64.zip
   creating: 10188727/
   creating: 10188727/files/
   creating: 10188727/files/lib/
   creating: 10188727/files/lib/libserver11.a/
  inflating: 10188727/files/lib/libserver11.a/kkopq.o
   creating: 10188727/etc/
   creating: 10188727/etc/config/
  inflating: 10188727/etc/config/inventory.xml
  inflating: 10188727/etc/config/actions.xml
  inflating: 10188727/etc/config/deploy.xml
   creating: 10188727/etc/xml/
  inflating: 10188727/etc/xml/GenericActions.xml
  inflating: 10188727/etc/xml/ShiphomeDirectoryStructure.xml
  inflating: 10188727/README.txt
   creating: 10188727/online/
   creating: 10188727/online/files/
   creating: 10188727/online/files/hpatch/
  inflating: 10188727/online/files/hpatch/bug10188727.pch
   creating: 10188727/online/etc/
   creating: 10188727/online/etc/config/
  inflating: 10188727/online/etc/config/inventory.xml
  inflating: 10188727/online/etc/config/actions.xml
  inflating: 10188727/online/etc/config/deploy.xml
   creating: 10188727/online/etc/xml/
  inflating: 10188727/online/etc/xml/GenericActions.xml
  inflating: 10188727/online/etc/xml/ShiphomeDirectoryStructure.xml 

[maclean@rh2 OPatch]$ opatch query 10188727 -all
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /s01/product/11.2.0/dbhome_2
Central Inventory : /s01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/product/11.2.0/dbhome_2/oui
Log file location : /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch2011-02-17_20-05-21PM.log

Patch history file: /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch_history.txt

--------------------------------------------------------------------------------
 Patch created on 2 Dec 2010, 01:44:13 hrs PST8PDT
 Need to shutdown Oracle instances: true
 Patch is roll-backable: true
 Patch is a "Patchset Update": false
 Patch is a rolling patch: true
 Patch has sql related actions: false
 Patch is an online patch: false
 Patch is a portal patch: false
 Patch is an "auto-enabled" patch: false

 List of platforms supported:
   226: Linux x86-64

 List of bugs to be fixed:
   10188727: AFTER UPGRADING TO 11.2.0.2 SOME SQLS FAIL WITH ORA-7445 [KKEIDC()+180] ERROR

 This patch is a "singleton" patch.

 This patch belongs to the "db" product family 

 List of executables affected:
   ORACLE_HOME/bin/oracle

 List of optional components:
   oracle.rdbms:  11.2.0.2.0

 List of optional actions:
   Update /s01/product/11.2.0/dbhome_2/lib/libserver11.a with /kkopq.o
   cd /s01/product/11.2.0/dbhome_2/rdbms/lib
     ; make -f ins_rdbms.mk ioracle ORACLE_HOME=/s01/product/11.2.0/dbhome_2

  Possible XML representation of the patch:

     10188727

--------------------------------------------------------------------------------
OPatch succeeded.

[maclean@rh2 OPatch]$ ./opatch query -is_online_patch 10188727
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /s01/product/11.2.0/dbhome_2
Central Inventory : /s01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/product/11.2.0/dbhome_2/oui
Log file location : /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch2011-02-17_19-45-33PM.log

Patch history file: /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch_history.txt

--------------------------------------------------------------------------------
 Patch is an online patch: false

OPatch succeeded.

/* 虽然Opatch返回"online patch: false",但实际上这是一个online patch,
    造成以上问题的原因可能是Opatch版本低 */

[maclean@rh2 OPatch]$ tree 10188727
10188727
|-- README.txt
|-- etc
|   |-- config
|   |   |-- actions.xml
|   |   |-- deploy.xml
|   |   `-- inventory.xml
|   `-- xml
|       |-- GenericActions.xml
|       `-- ShiphomeDirectoryStructure.xml
|-- files
|   `-- lib
|       `-- libserver11.a
|           `-- kkopq.o
`-- online
    |-- etc
    |   |-- config
    |   |   |-- actions.xml
    |   |   |-- deploy.xml
    |   |   `-- inventory.xml
    |   `-- xml
    |       |-- GenericActions.xml
    |       `-- ShiphomeDirectoryStructure.xml
    `-- files
        `-- hpatch
            `-- bug10188727.pch

/* 可以从以上目录结构中看到包含了online子目录,我们可以直接观察其inventory.xml信息文件 */

[maclean@rh2 OPatch]$ cat 10188727/online/etc/config/inventory.xml |grep instance
 <instance_shutdown>false</instance_shutdown>
 <instance_shutdown_message></instance_shutdown_message>

/* 以上instance_shutdown为false说明其可以作为online patch实施 */

[maclean@rh2 OPatch]$ cd 10188727

/* opatch online patching的具体语法如下 */
opatch apply online -connectString  <SID>:<USERNAME>:<PASSWORD>:<NODE1>, \
<SID2>:<USERNAME>:<PASSWORD>:<NODE2>,... 

[maclean@rh2 10188727]$ opatch apply online -connectString PROD:sys:password
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /s01/product/11.2.0/dbhome_2
Central Inventory : /s01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/product/11.2.0/dbhome_2/oui
Log file location : /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch2011-02-17_19-49-44PM.log

Patch history file: /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch_history.txt

The patch should be applied/rolled back in '-all_nodes' mode only.
Converting the RAC mode to '-all_nodes' mode.
ApplySession applying interim patch '10188727' to OH '/s01/product/11.2.0/dbhome_2'

Running prerequisite checks...

OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.

Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '10188727' for restore. This might take a while...
Backing up files affected by the patch '10188727' for rollback. This might take a while...

Patching component oracle.rdbms, 11.2.0.2.0...
The patch will be installed on active database instances.
Installing and enabling the online patch 'bug10188727.pch', on database 'PROD'.

ApplySession adding interim patch '10188727' to inventory

Verifying the update...
Inventory check OK: Patch ID 10188727 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 10188727 are present in Oracle Home.
OPatch succeeded.

[maclean@rh2 OPatch]$ opatch lsinventory -detail|tail -21
Interim patches (1) :

Patch (online) 10188727: applied on Thu Feb 17 19:49:52 CST 2011
Unique Patch ID:  13202318
   Created on 2 Dec 2010, 01:44:15 hrs PST8PDT
   Bugs fixed:
     10188727
   Files Touched:
     bug10188727.pch --> ORACLE_HOME/hpatch/bug10188727.pch
   Instances Patched:
     PROD
   Patch Location in Inventory:
     /s01/product/11.2.0/dbhome_2/inventory/oneoffs/10188727
   Patch Location in Storage area:
     /s01/product/11.2.0/dbhome_2/.patch_storage/10188727_Dec_2_2010_01_44_15

告警日志alert.log中的信息:
Patch file bug10188727.pch is out of sync with oracle binary; performing fixup
Patch file bug10188727.pch has been synced with oracle binary
Patch bug10188727.pch Installed - Update #1
Patch bug10188727.pch Enabled - Update #2

[maclean@rh2 OPatch]$ ps -ef|grep pmon|grep -v grep
maclean  22481     1  0 19:19 ?        00:00:00 ora_pmon_PROD

[maclean@rh2 OPatch]$ pmap -d 22481|tail -10
00002abecb435000       4 rwx-- 0000000000009000 008:00002 libnque11.so
00002abecb436000    1048 rwx-- 00002abecb436000 000:00000   [ anon ]
00002abecb53c000       8 r-x-- 000000000c64e000 008:00002 oracle
00002abecb53e000    5052 r-x-- 00000000000bd000 008:00002 oracle
00002abecba2d000     140 r-x-- 0000000000000000 008:00002 bug10188727.so
00002abecba50000    1024 ----- 0000000000023000 008:00002 bug10188727.so
00002abecbb50000       8 rwx-- 0000000000023000 008:00002 bug10188727.so
00007fff3342d000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 2297720K    writeable/private: 7848K    shared: 2050052K

/* 再次观察pmon进程的内存信息,可以看到pmap输出中多出了2个小的oracle正文镜像和
    名为bug10188727.so的共享库文件,而该后台进程的private memory由原来的7840k上升到7848k,
    实际增幅为8k */

/* 此外可以通过oradebug命令将该online patch禁用,虽然并不推荐这样做 */

SQL> oradebug patch disable  bug10188727.pch;
Statement processed.

SQL>  oradebug patch disable bug10188727.pch;
Patch file already disabled

[maclean@rh2 hpatch]$ pmap -d 22481|tail -8
00002abecb53c000       8 r-x-- 000000000c64e000 008:00002 oracle
00002abecb53e000    5052 r-x-- 00000000000bd000 008:00002 oracle
00002abecba2d000     140 r-x-- 0000000000000000 008:00002 bug10188727.so
00002abecba50000    1024 ----- 0000000000023000 008:00002 bug10188727.so
00002abecbb50000       8 rwx-- 0000000000023000 008:00002 bug10188727.so
00007fff3342d000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 2297720K    writeable/private: 7848K    shared: 2050052K

/* 但disable掉online patch并不会导致在线补丁额外消耗的内存被回收 */

/* 当然我们还可以很方便地启用它 */

SQL> oradebug patch enable bug10188727.pch;
Statement processed.
SQL>  oradebug patch enable bug10188727.pch;
Patch file already enabled

[maclean@rh2 ~]$ cd $ORACLE_HOME/hpatch

[maclean@rh2 hpatch]$ ls -l
total 368
-rw-r--r-- 1 maclean oinstall 177874 Feb 17 19:49 bug10188727.pch
-rwx------ 1 maclean oinstall      1 Feb 17 19:49 bug10188727.pchPROD.fixup
-rwx------ 1 maclean oinstall 176850 Feb 17 19:49 bug10188727.so
-rw------- 1 maclean oinstall    712 Feb 17 20:13 orapatchPROD.cfg

/* 注意不要在实例启动时删除以上hpatch目录及目录下任何文件,这可能导致instance出现意外 */

[maclean@rh2 hpatch]$ cd $ORACLE_HOME/OPatch

/* 我们还能够将online patch rollback回滚掉,如以下语法 */
opatch rollback -id <patchID> -connectString  <SID>:<USERNAME>:<PASSWORD>:<NODE1>, \
<SID2>:<USERNAME>:<PASSWORD>:<NODE2>,  ...

[maclean@rh2 OPatch]$ opatch rollback -id 10188727 -connectString PROD:sys:password -invPtrLoc /s01/product/11.2.0/dbhome_2/oraInst.loc
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /s01/product/11.2.0/dbhome_2
Central Inventory : /s01/oraInventory
   from           : /s01/product/11.2.0/dbhome_2/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/product/11.2.0/dbhome_2/oui
Log file location : /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch2011-02-17_20-18-00PM.log

Patch history file: /s01/product/11.2.0/dbhome_2/cfgtoollogs/opatch/opatch_history.txt

RollbackSession rolling back interim patch '10188727' from OH '/s01/product/11.2.0/dbhome_2'

The patch should be applied/rolled back in '-all_nodes' mode only.
Converting the RAC mode to '-all_nodes' mode.

Running prerequisite checks...

OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.

Backing up files affected by the patch '10188727' for restore. This might take a while...

Patching component oracle.rdbms, 11.2.0.2.0...
The patch will be removed from active database instances.
Disabling and removing online patch 'bug10188727.pch', on database 'PROD'

RollbackSession removing interim patch '10188727' from inventory

OPatch succeeded.

告警日志alert.log中的remove信息:
Patch bug10188727.pch Disabled - Update #5
Patch bug10188727.pch Removed - Update #6
Thu Feb 17 20:18:07 2011
Online patch bug10188727.pch has been disabled
Online patch bug10188727.pch has been removed

[maclean@rh2 trace]$ pmap -d 22481|tail -8
00002abecb53c000       8 r-x-- 000000000c64e000 008:00002 oracle
00002abecb53e000    5052 r-x-- 00000000000bd000 008:00002 oracle
00002abecba2d000     140 r-x-- 0000000000000000 008:00002 bug10188727.so
00002abecba50000    1024 ----- 0000000000023000 008:00002 bug10188727.so
00002abecbb50000       8 rwx-- 0000000000023000 008:00002 bug10188727.so
00007fff3342d000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 2297720K    writeable/private: 7848K    shared: 2050052K

/* 显然rollback回滚掉该online interim patch也不足以回收内存,唯一的方法是重启实例 */

SQL> startup force;

[maclean@rh2 trace]$ ps -ef|grep pmon|grep -v grep
maclean  25563     1  0 20:22 ?        00:00:00 ora_pmon_PROD

[maclean@rh2 trace]$ pmap -d 25563|tail -8
00002aaf2f324000       4 rwxs- 0000000000000000 008:00002 hc_PROD.dat
00002aaf2f325000      40 r-x-- 0000000000000000 008:00002 libnque11.so
00002aaf2f32f000    1020 ----- 000000000000a000 008:00002 libnque11.so
00002aaf2f42e000       4 rwx-- 0000000000009000 008:00002 libnque11.so
00002aaf2f42f000    1048 rwx-- 00002aaf2f42f000 000:00000   [ anon ]
00007fffdfa84000      84 rwx-- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 2291488K    writeable/private: 7840K    shared: 2050052K

/* That's ok!
    ash to ash, dust to dust! */

References:<RDBMS Online Patching Aka Hot Patching [ID 761111.1]>

沪ICP备14014813号-2

沪公网安备 31010802001379号