More About ORACLE_AFFINITY ORACLE_PRIORITY Windows Process Scheduling

Using ORACLE_AFFINITY

Under certain conditions, it might be desirable to reserve a CPU for a particular thread or set of threads. Good candidates are usually threads that either are single threads and whose performance degrades with higher load or who affect performance adversely on all CPU’s on the system.

 

When running RAC, the CPU utilization of LMS processes increases at a faster pace than the number of lock and CR and current blocks requests. The throughput and scalability of LMS are crucial to the performance of RAC, since current and CR server, as well as lock request latencies are directly correlated with how fast and efficient LMS can process the requests. Tests on Solaris in Oracle8i have shown, that binding LMD0 to a separate CPU, will give nearly optimal performance results. On NT, the LMS threads can also be bound to a dedicated processor.

 

Example:

 

HKEY_LOCAL_MACHINE->SOFTWARE->ORACLE->

 

  • ORACLE_AFFINITY=”LMS0:8;LMS1:8;DEF:7;USER:7”

On a 4-CPU machine, this would indicate that the LMS threads run on CPU 3, and all other Oracle threads on CPU’s 0, 1 and 2.

 

  • ORACLE_AFFINITY=”LMS0:1;DEF:14”

LMS0 runs on CPU 0, all other threads on CPU’s 1, 2 and 3.

 

Using ORACLE_PRIORITY

 

Similarly to ORACLE_AFFINITY, the priority of an Oracle thread can be modified. There exist six different thread priorities:

 

  • lowest
  • below_normal
  • normal
  • above_normal
  • highest
  • time_critical

 

Since threads can have their priorities dynamically adjusted by the system within a process in the idle, normal or high priority class, using ORACLE_PRIORITY may not have a big impact. Also, when CPU resources are already limited, having a single thread running with increased priority will not increase throughput any further. This was observed when running the LMD thread with priority set to HIGHEST in an Oracle8 8i environment.

 

More information can be found in the “Oracle Database 10g Architecture on Windows” white paper on OTN, in the Windows specific 10g documentation.

Global Cache CR Requested But Current Block Received

这篇文章和之前的《MINSCN与Cache Fusion Read Consistent》 是姊妹篇,他们源于同一个问题帖子

我们来重现提问者所看到的这样一个场景:

 

SQL> select * from V$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select count(*) from gv$instance;

  COUNT(*)
----------
         2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

 

 

 

在11gR2 2节点RAC环境中将一个数据块的status修改为XG,假设这个Xcurrent block当前在INSTANCE 2被hold住,这时我们在INSTANCE 1反复查询这个数据块,并观察结果:

 

SQL> select  * from test;

        ID
----------
         1
         2

SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from test;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                               89233                                    1
                               89233                                    1

SQL> alter system flush buffer_cache;

System altered.

INSTANCE 1 Session A:

SQL>  update test set id=id+1 where id=1;

1 row updated.

INSTANCE 1 Session B:

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1755287

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_19111.trc

GLOBAL CACHE ELEMENT DUMP (address: 0xa4ff3080):
  id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
  lock: X rls: 0x0 acq: 0x0 latch: 3
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kdswh11: kdst_fetch'
  bscn: 0x0.146e20 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0xa9f6a6f8,0xa9f6a6f8]
  seq: 32 hist: 58 145:0 118 66 144:0 192 352 197 48 121 113 424 180 58
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x02000001 lflg: 0x1 state: XCURRENT tsn: 0 tsh: 2
      addr: 0xa9f6a5c8 obj: 76896 cls: DATA bscn: 0x0.1ac898
BH (0xa9f6a5c8) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0xa9e56000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 3 pwc: 0,15
  dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
  hash: [0x91f4e970,0xbae9d5b8] lru: [0x91f58848,0xa9f6a828]
  lru-flags: debug_dump
  obj-flags: object_ckpt_list
  ckptq: [0x9df6d1d8,0xa9f6a740] fileq: [0xa2ece670,0xbdf4ed68] objq: [0xb4964e00,0xb4964e00] objaq: [0xb4964de0,0xb4964de0]
  st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 2 le: 0xa4ff3080
  flags: buffer_dirty redo_since_read
  LRBA: [0x19.5671.0] LSCN: [0x0.1ac898] HSCN: [0x0.1ac898] HSUB: [1]
  buffer tsn: 0 rdba: 0x00415c91 (1/89233)
  scn: 0x0000.001ac898 seq: 0x01 flg: 0x00 tail: 0xc8980601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

 

 

可以看到此时block: (1/89233)的GLOBAL CACHE ELEMENT DUMP中LOCK状态仍是X 而非XG , 这是因为这个Current Block仅在一个Instance中被modify修改过,没有在全局范围内被更新过。

 

 

紧接着在Instance 2 修改该块:

 

 

Instance 2 Session C:

SQL> update test set id=id+1 where id=2;

1 row updated.

Instance 2 Session D:

SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1756658

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_ora_13038.trc

GLOBAL CACHE ELEMENT DUMP (address: 0x89fb25a0):
  id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
  lock: XG rls: 0x0 acq: 0x0 latch: 3
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kduwh01: kdusru'
  bscn: 0x0.1acdf3 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0x96f4cf80,0x96f4cf80]
  seq: 61 hist: 324 21 143:0 19 16 352 329 144:6 14 7 352 197
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x0a000001 state: XCURRENT tsn: 0 tsh: 1
      addr: 0x96f4ce50 obj: 76896 cls: DATA bscn: 0x0.1acdf6
BH (0x96f4ce50) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0x96bd4000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
  hash: [0x96ee1fe8,0xbae9d5b8] lru: [0x96f4d0b0,0x96f4cdc0]
  obj-flags: object_ckpt_list
  ckptq: [0xbdf519b8,0x96f4d5a8] fileq: [0xbdf519d8,0xbdf519d8] objq: [0xb4a47b90,0xb4a47b90] objaq: [0x96f4d0e8,0xb4a47b70]
  st: XCURRENT md: NULL fpin: 'kduwh01: kdusru' tch: 1 le: 0x89fb25a0
  flags: buffer_dirty redo_since_read remote_transfered
  LRBA: [0x11.9e18.0] LSCN: [0x0.1acdf6] HSCN: [0x0.1acdf6] HSUB: [1]
  buffer tsn: 0 rdba: 0x00415c91 (1/89233)
  scn: 0x0000.001acdf6 seq: 0x01 flg: 0x00 tail: 0xcdf60601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

 GCS CLIENT 0x89fb2618,6 resp[(nil),0x15c91.1] pkey 76896.0
   grant 2 cvt 0 mdrole 0x42 st 0x100 lst 0x20 GRANTQ rl G0
   master 1 owner 2 sid 0 remote[(nil),0] hist 0x94121c601163423c
   history 0x3c.0x4.0xd.0xb.0x1.0xc.0x7.0x9.0x14.0x1.
   cflag 0x0 sender 1 flags 0x0 replay# 0 abast (nil).x0.1 dbmap (nil)
   disk: 0x0000.00000000 write request: 0x0000.00000000
   pi scn: 0x0000.00000000 sq[(nil),(nil)]
   msgseq 0x1 updseq 0x0 reqids[6,0,0] infop (nil) lockseq x2b8
   pkey 76896.0
   hv 93 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 18, dom 0]
   kjga st 0x4, step 0.0.0, cinc 20, rmno 6, flags 0x0
   lb 0, hb 0, myb 15250, drmb 15250, apifrz 0

 

 

在Instance 2中被修改过后block: (1/89233)的 GLOBAL CACHE ELEMENT Lock Convert成lock: XG

除了通过GC_ELEMENTS DUMP来分析XCUR Cache Fusion外,也可以直接查询X$ VIEW,主要是 X$LE X$KJBR X$KJBL, 这三个X$ VIEW的更多信息可以很方便地从我的博客中找到:

 

 

INSTANCE 2 Session D:

SELECT *
  FROM x$le
 WHERE le_addr IN (SELECT le_addr
                     FROM x$bh
                    WHERE obj IN (SELECT data_object_id
                                    FROM dba_objects
                                   WHERE owner = 'SYS'
                                     AND object_name = 'TEST')
                      AND class = 1
                      AND state != 3);

ADDR                   INDX    INST_ID LE_ADDR              LE_ID1     LE_ID2
---------------- ---------- ---------- ---------------- ---------- ----------
    LE_RLS     LE_ACQ   LE_FLAGS    LE_MODE   LE_WRITE   LE_LOCAL LE_RECOVERY
---------- ---------- ---------- ---------- ---------- ---------- -----------
   LE_BLKS    LE_TIME LE_KJBL
---------- ---------- ----------------
00007F94CA14CF60       7003          2 0000000089FB25A0      89233          1
         0          0         32          2          0          1           0
         1          0 0000000089FB2618

 

 

PCM Resource NAME由[ID1][ID2],[BL]等组成, ID1和ID2 通过blockno和 fileno计算获得, 这里我们直接参考以上GC_elements dump中的 id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)信息 ,则  kjblname 和 kjbrname 应以”[0x15c91][0x1],[BL]” 开头:

 

 

INSTANCE 2 Session D:

SQL> set linesize 80 pagesize 1400
SQL> SELECT *
  2    FROM x$kjbl l
  3   WHERE l.kjblname LIKE '%[0x15c91][0x1],[BL]%';

ADDR                   INDX    INST_ID KJBLLOCKP        KJBLGRANT KJBLREQUE
---------------- ---------- ---------- ---------------- --------- ---------
  KJBLROLE KJBLRESP         KJBLNAME
---------- ---------------- ------------------------------
KJBLNAME2                       KJBLQUEUE
------------------------------ ----------
KJBLLOCKST                                                       KJBLWRITING
---------------------------------------------------------------- -----------
KJBLREQWRITE  KJBLOWNER KJBLMASTER KJBLBLOCKED KJBLBLOCKER    KJBLSID KJBLRDOMID
------------ ---------- ---------- ----------- ----------- ---------- ----------
  KJBLPKEY
----------
00007F94CA22A288        451          2 0000000089FB2618 KJUSEREX  KJUSERNL
         0 00               [0x15c91][0x1],[BL][ext 0x0,0x
89233,1,BL                              0
GRANTED                                                                    0
           0          1          0           0           0          0          0
     76896

SQL> SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';

no rows selected

Instance 1 session B:

SQL>  SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';

ADDR                   INDX    INST_ID KJBRRESP         KJBRGRANT KJBRNCVL
---------------- ---------- ---------- ---------------- --------- ---------
  KJBRROLE KJBRNAME                       KJBRMASTER KJBRGRANTQ
---------- ------------------------------ ---------- ----------------
KJBRCVTQ         KJBRWRITER          KJBRSID KJBRRDOMID   KJBRPKEY
---------------- ---------------- ---------- ---------- ----------
00007F801ACA68F8       1355          1 00000000B5A62AE0 KJUSEREX  KJUSERNL
         0 [0x15c91][0x1],[BL][ext 0x0,0x          0 00000000B48BB330
00               00                        0          0      76896

 

 

接着我们将在Instance 1上查询block: (1/89233),这应当会引发Instance 2 build cr block 并传输给Instance 1, 为了进一步了解其细节 我们分别对 Instance 1的 Foreground Process 和 Instance 2的LMS进程做详细的RAC  TRACE:

 

Instance 2:

[oracle@vrh2 ~]$ ps -ef|grep ora_lms|grep -v grep
oracle   23364     1  0 Apr29 ?        00:33:15 ora_lms0_VPROD2

SQL> oradebug setospid 23364
Oracle pid: 13, Unix process pid: 23364, image: oracle@vrh2.oracle.com (LMS0)

SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high;
Statement processed.

SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_lms0_23364.trc

Instance 1 session B :

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1756658
         3    1756661
         3    1755287

Instance 1 session A :

SQL> alter session set events '10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high';

Session altered.

SQL> select * from test;

        ID
----------
         2
         2

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1761520

 

从x$BH的信息来看,如我们所预期的Instance 1收到了build好的CR block,进一步分析 TRACE 文件:

 

Instance 1 foreground Process:

PARSING IN CURSOR #140336527348792 len=18 dep=0 uid=0 oct=3 lid=0 tim=1335939136125254 hv=1689401402 ad='b1a4c828' sqlid='c99yw1xkb4f1u'
select * from test
END OF STMT
PARSE #140336527348792:c=2999,e=2860,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125253
EXEC #140336527348792:c=0,e=40,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125373
WAIT #140336527348792: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335939136125420

*** 2012-05-02 02:12:16.125
kclscrs: req=0 block=1/89233
2012-05-02 02:12:16.125574 : kjbcro[0x15c91.1 76896.0][4]

*** 2012-05-02 02:12:16.125
kclscrs: req=0 typ=nowait-abort

*** 2012-05-02 02:12:16.125
kclscrs: bid=1:3:1:0:f:1e:0:0:10:0:0:0:1:2:4:1:20:0:0:0:c3:49:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:1c:0:4d:26:a3:52:0:0:0:0:c7:c:ca:62:c3:49:0:0:0:0:1:0:14:8e:47:76:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:99:ed:0:0:0:0:0:0:10:0:0:0
2012-05-02 02:12:16.125718 : kjbcro[0x15c91.1 76896.0][4]
2012-05-02 02:12:16.125751 : GSIPC:GMBQ: buff 0xba0ee018, queue 0xbb79a7b8, pool 0x60013fa0, freeq 0, nxt 0xbb79a7b8, prv 0xbb79a7b8
2012-05-02 02:12:16.125780 : kjbsentscn[0x0.1ae0f0][to 2]
2012-05-02 02:12:16.125806 : GSIPC:SENDM: send msg 0xba0ee088 dest x20001 seq 177740 type 36 tkts xff0000 mlen x1680198
2012-05-02 02:12:16.125918 : kjbmscr(0x15c91.1)reqid=0x8(req 0xa4ff30f8)(rinst 1)hldr 2(infosz 200)(lseq x2b8)
2012-05-02 02:12:16.126959 : GSIPC:KSXPCB: msg 0xba0ee088 status 30, type 36, dest 2, rcvr 1

*** 2012-05-02 02:12:16.127
kclwcrs: wait=0 tm=1233

*** 2012-05-02 02:12:16.127
kclwcrs: got 1 blocks from ksxprcv
WAIT #140336527348792: nam='gc cr block 2-way' ela= 1233 p1=1 p2=89233 p3=1 obj#=76896 tim=1335939136127199
2012-05-02 02:12:16.127272 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-05-02 02:12:16.127309 : kjbrcvdscn[0x0.1ae0f0][from 2][idx 2012-05-02 02:12:16.127329 : kjbrcvdscn[no bscn <= rscn 0x0.1ae0f0][from 2]

 

 

前台进程 kjbcro[0x15c91.1 76896.0][4] kjbsentscn[0x0.1ae0f0][to 2] 向Instance 2请求SCN=1ae0f0=1761520的 block: (1/89233),并进入’gc cr block 2-way’ 等待,之后成功收到请求的CR block。

 

Instance 2 LMS TRACE

 

2012-05-02 02:12:15.634057 : GSIPC:RCVD: ksxp msg 0x7f16e1598588 sndr 1 seq 0.177740 type 36 tkts 0
2012-05-02 02:12:15.634094 : GSIPC:RCVD: watq msg 0x7f16e1598588 sndr 1, seq 177740, type 36, tkts 0
2012-05-02 02:12:15.634108 : GSIPC:TKT: collect msg 0x7f16e1598588 from 1 for rcvr -1, tickets 0
2012-05-02 02:12:15.634162 : kjbrcvdscn[0x0.1ae0f0][from 1][idx 2012-05-02 02:12:15.634186 : kjbrcvdscn[no bscn1, wm 32768, RMno 0, reminc 18, dom 0]
   kjga st 0x4, step 0.0.0, cinc 20, rmno 6, flags 0x0
   lb 0, hb 0, myb 15250, drmb 15250, apifrz 0
 GCS CLIENT END
2012-05-02 02:12:15.635211 : kjbdowncvt[0x15c91.1 76896.0][1][options x0]
2012-05-02 02:12:15.635230 : GSIPC:AMBUF: rcv buff 0x7f16e1c56420, pool rcvbuf, rqlen 1103
2012-05-02 02:12:15.635308 : GSIPC:GPBMSG: new bmsg 0x7f16e1c56490 mb 0x7f16e1c56420 msg 0x7f16e1c564b0 mlen 152 dest x101 flushsz -1
2012-05-02 02:12:15.635334 : kjbmslset(0x15c91.1)) seq 0x4 reqid=0x6 (shadow 0xb48bb330.xb)(rsn 2)(mas@1)
2012-05-02 02:12:15.635355 : GSIPC:SPBMSG: send bmsg 0x7f16e1c56490 blen 184 msg 0x7f16e1c564b0 mtype 33 attr|dest x30101 bsz|fsz x1ffff
2012-05-02 02:12:15.635377 : GSIPC:SNDQ: enq msg 0x7f16e1c56490, type 65521 seq 118669, inst 1, receiver 1, queued 1

*** 2012-05-02 02:12:15.635
kclccctx: cleanup copy 0x7f16e1d94798
2012-05-02 02:12:15.635479 : [kjmpmsgi:compl][type 36][msg 0x7f16e1598588][seq 177740.0][qtime 0][ptime 1257]
2012-05-02 02:12:15.635511 : GSIPC:BSEND: flushing sndq 0xb491dd28, id 1, dcx 0xbc516778, inst 1, rcvr 1  qlen 0 1
2012-05-02 02:12:15.635536 : GSIPC:BSEND: no batch1 msg 0x7f16e1c56490 type 65521 len 184 dest (1:1)
2012-05-02 02:12:15.635557 : kjbsentscn[0x0.1ae0f1][to 1]
2012-05-02 02:12:15.635578 : GSIPC:SENDM: send msg 0x7f16e1c56490 dest x10001 seq 118669 type 65521 tkts x10002 mlen xb800e8
WAIT #0: nam='gcs remote message' ela= 180 waittime=1 poll=0 event=0 obj#=0 tim=1335939135635819
2012-05-02 02:12:15.635853 : GSIPC:RCVD: ksxp msg 0x7f16e167e0b0 sndr 1 seq 0.177741 type 32 tkts 0
2012-05-02 02:12:15.635875 : GSIPC:RCVD: watq msg 0x7f16e167e0b0 sndr 1, seq 177741, type 32, tkts 0
2012-05-02 02:12:15.636012 : GSIPC:TKT: collect msg 0x7f16e167e0b0 from 1 for rcvr -1, tickets 0
2012-05-02 02:12:15.636040 : kjbrcvdscn[0x0.1ae0f1][from 1][idx 2012-05-02 02:12:15.636060 : kjbrcvdscn[no bscn <= rscn 0x0.1ae0f1][from 1]
2012-05-02 02:12:15.636082 : GSIPC:TKT: dest (1:1) rtkt not acked 1  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:12:15.636102 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:12:15.636125 : [kjmxmpm][type 32][seq 0.177741][msg 0x7f16e167e0b0][from 1]
2012-05-02 02:12:15.636146 : kjbmpocr(0xb0.6)seq 0x1,reqid=0x23a,(client 0x9fff7b58,0x1)(from 1)(lseq xdf0)

 

2号实例上LMS进程的情况是这样的 收到gcs remote message GSIPC 关于要求SCN=[0x0.1ae0f0] block=1/89233的请求,进入BAST kjbmpbast(0x15c91.1),因为 block=1/89233已被写入到磁盘 触发fairness算法(在11.2.0.3中默认_fairness_threshold=2),对current block做KCL: F156: fairness downconvert,从Xcurrent DownConvert为 Scurrent:

 

Instance 2:
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

STATE CR_SCN_BAS
---------- ----------
2 0
3 1756658

 

之后Instance 2 LMS 将cr block加入到 kjbmslset(0x15c91.1)) 传送队列SEND QUEUE GSIPC:SNDQ: enq msg 0x7f16e1c56490。

 

接着我们再次在Instance 1上运行对 block: (1/89233)的查询 看会发生什么:

 

Instance 2:

SQL> select CURRENT_RESULTS,LIGHT_WORKS from v$cr_block_server;

CURRENT_RESULTS LIGHT_WORKS
--------------- -----------
          29273         437

Instance 1 session A:

SQL>
SQL> select * from test;

        ID
----------
         2
         2

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1761942
         3    1761932
         1          0
         3    1761520

Instance 2:

SQL> select CURRENT_RESULTS,LIGHT_WORKS from v$cr_block_server;

CURRENT_RESULTS LIGHT_WORKS
--------------- -----------
          29274         437

select * from test
END OF STMT
PARSE #140336529675592:c=0,e=337,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939668940051
EXEC #140336529675592:c=0,e=96,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939668940204
WAIT #140336529675592: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335939668940348

*** 2012-05-02 02:21:08.940
kclscrs: req=0 block=1/89233
2012-05-02 02:21:08.940676 : kjbcro[0x15c91.1 76896.0][5]

*** 2012-05-02 02:21:08.940
kclscrs: req=0 typ=nowait-abort

*** 2012-05-02 02:21:08.940
kclscrs: bid=1:3:1:0:f:21:0:0:10:0:0:0:1:2:4:1:20:0:0:0:c3:49:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:1f:0:4d:26:a3:52:0:0:0:0:c7:c:ca:62:c3:49:0:0:0:0:1:0:17:8e:47:76:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:99:ed:0:0:0:0:0:0:10:0:0:0
2012-05-02 02:21:08.940799 : kjbcro[0x15c91.1 76896.0][5]
2012-05-02 02:21:08.940833 : GSIPC:GMBQ: buff 0xba0ee018, queue 0xbb79a7b8, pool 0x60013fa0, freeq 0, nxt 0xbb79a7b8, prv 0xbb79a7b8
2012-05-02 02:21:08.940859 : kjbsentscn[0x0.1ae28c][to 2]
2012-05-02 02:21:08.940870 : GSIPC:SENDM: send msg 0xba0ee088 dest x20001 seq 177810 type 36 tkts xff0000 mlen x1680198
2012-05-02 02:21:08.940976 : kjbmscr(0x15c91.1)reqid=0xa(req 0xa4ff30f8)(rinst 1)hldr 2(infosz 200)(lseq x2b8)
2012-05-02 02:21:08.941314 : GSIPC:KSXPCB: msg 0xba0ee088 status 30, type 36, dest 2, rcvr 1

*** 2012-05-02 02:21:08.941
kclwcrs: wait=0 tm=707

*** 2012-05-02 02:21:08.941
kclwcrs: got 1 blocks from ksxprcv
2012-05-02 02:21:08.941818 : kjbassume[0x15c91.1][sender 2][mymode x1][myrole x0][srole x0][flgs x0][spiscn 0x0.0][swscn 0x0.0]
2012-05-02 02:21:08.941852 : kjbrcvdscn[0x0.1ae28d][from 2][idx 2012-05-02 02:21:08.941871 : kjbrcvdscn[no bscn

 

前台进程与上一次类似申请的是SCN=[0x0.1ae28c]=1761932 Version的CR block, 但这一次receive的居然是Xcurrent Block且其SCN=1ae28d=1761933,Instance 1收到了Xcurrent Block后自己build出了查询所需要的SCN=1761932的CR BLOCK, 因为实际收到的是Current block,所以等待事件变成了’gc current block 2-way’。

这里可以看到前台进程并没有request current block,而仍是使用kjbcro;追根述源是Instance 2的LMS进程给它传送了Current Block:

 

Instance 2 LMS trace:

2012-05-02 02:21:08.448743 : GSIPC:RCVD: ksxp msg 0x7f16e14a4398 sndr 1 seq 0.177810 type 36 tkts 0
2012-05-02 02:21:08.448778 : GSIPC:RCVD: watq msg 0x7f16e14a4398 sndr 1, seq 177810, type 36, tkts 0
2012-05-02 02:21:08.448798 : GSIPC:TKT: collect msg 0x7f16e14a4398 from 1 for rcvr -1, tickets 0
2012-05-02 02:21:08.448816 : kjbrcvdscn[0x0.1ae28c][from 1][idx 2012-05-02 02:21:08.448834 : kjbrcvdscn[no bscn <= rscn 0x0.1ae28c][from 1]
2012-05-02 02:21:08.448857 : GSIPC:TKT: dest (1:1) rtkt not acked 2  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:21:08.448875 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:21:08.448970 : [kjmxmpm][type 36][seq 0.177810][msg 0x7f16e14a4398][from 1]
2012-05-02 02:21:08.448993 : kjbmpbast(0x15c91.1) reqid=0x6 (req 0xa4ff30f8)(reqinst 1)(reqid 10)(flags x0)

*** 2012-05-02 02:21:08.449
kclcrrf: req=48054 block=1/89233

*** 2012-05-02 02:21:08.449
kcl_compress_block: compressed: 6 free space: 7680
2012-05-02 02:21:08.449085 : kjbsentscn[0x0.1ae28d][to 1]
2012-05-02 02:21:08.449142 : kjbdeliver[to 1][0xa4ff30f8][10][current 1]
2012-05-02 02:21:08.449164 : kjbmssch(reqlock 0xa4ff30f8,10)(to 1)(bsz 344)
2012-05-02 02:21:08.449183 : GSIPC:AMBUF: rcv buff 0x7f16e18bcec8, pool rcvbuf, rqlen 1102

*** 2012-05-02 02:21:08.449
kclccctx: cleanup copy 0x7f16e1d94838

*** 2012-05-02 02:21:08.449
kcltouched: touch seconds 3271

*** 2012-05-02 02:21:08.449
kclgrantlk: req=48054
2012-05-02 02:21:08.449347 : [kjmpmsgi:compl][type 36][msg 0x7f16e14a4398][seq 177810.0][qtime 0][ptime 1119]
WAIT #0: nam='gcs remote message' ela= 568 waittime=1 poll=0 event=0 obj#=0 tim=1335939668449962
2012-05-02 02:21:08.450001 : GSIPC:RCVD: ksxp msg 0x7f16e1bb22a0 sndr 1 seq 0.177811 type 32 tkts 0
2012-05-02 02:21:08.450024 : GSIPC:RCVD: watq msg 0x7f16e1bb22a0 sndr 1, seq 177811, type 32, tkts 0
2012-05-02 02:21:08.450043 : GSIPC:TKT: collect msg 0x7f16e1bb22a0 from 1 for rcvr -1, tickets 0
2012-05-02 02:21:08.450060 : kjbrcvdscn[0x0.1ae28e][from 1][idx 2012-05-02 02:21:08.450078 : kjbrcvdscn[no bscn <= rscn 0x0.1ae28e][from 1]
2012-05-02 02:21:08.450097 : GSIPC:TKT: dest (1:1) rtkt not acked 3  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:21:08.450116 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:21:08.450136 : [kjmxmpm][type 32][seq 0.177811][msg 0x7f16e1bb22a0][from 1]
2012-05-02 02:21:08.450155 : kjbmpocr(0xb0.6)seq 0x1,reqid=0x23e,(client 0x9fff7b58,0x1)(from 1)(lseq xdf4)

 

 

为什么Instance 2上的LMS要偷懒,不构造build cr block,而直接传输给Instance 1自己所有的Current Block呢?通过观察Instance 2上的v$cr_block_server视图可以发现LIGHT_WORKS字段在发生current block transfer后并没有增长,说明并没有触发 CR server的 Light Work Rule(Light Work Rule是8i Cr Server中就存在的优化算法,当Remote LMS创建 build CR涉及的工作过多时,resource holder的LMS会选择传输现有block,而不完成CR build If creating the consistent read version block involves too much work (such as reading blocks from disk), then the holder sends the block to the requestor, and the requestor completes the CR fabrication. The holder maintains a fairness counter of CR requests. After the fairness threshold is reached, the holder downgrades it to lock mode.)。

 

到底何种条件下 CR Request 才会收到Current Block呢?

答案是:针对不同种类class的block,CR server有不同的做法。 对于undo block或者 undo header block的CR quest, LMS总是传输Current Block, 这是为了给 远程节点 可能正在进行的 block cleanout、 CR  Version构造提供便利。

 

对于普通的数据块 data blocks, 默认情况下总是 CR quest  & CR received的(排除之前所说的Light Work Rule,LMS”偷懒”), 除非Current Block已经DownConvert降级为S lock,那么LMS进程会宣称直接ship一个current version的block。

 

为了证明这一点 , 我们再次测试 ,这次我们将控制DownConvert的隐藏参数”_fairness_threshold“提高到200,这将造成Xcurrent Block无法降级为Scurrent, 也将导致LMS更少地传输Current Version的Data Block:

 

 

 

SQL> show parameter fair

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_fairness_threshold                  integer     200

Instance 1:

SQL>  update test set id=id+1 where id=4;

1 row updated.

Instance 2:

SQL>  update test set id=id+1 where id=2;

1 row updated.

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1838166

在Instance 1上 不断查询,并 通过instance 2的 v$cr_block_server验证

instance 1
SQL> select * from test;

        ID
----------
        10
         3

instance 2:

SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1883707
         8          0

SQL> select * from test;

        ID
----------
        10
         3

SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1883707
         8          0

...................

SQL> /

     STATE CR_SCN_BAS
---------- ----------
         2          0
         3    1883707
         3    1883695

repeat cr request on Instance 1

SQL> /

     STATE CR_SCN_BAS
---------- ----------
         8          0
         3    1883707
         3    1883695

 

 

实际测试发现_fairness_threshold似乎存在某个上限,虽然设置为200 但是在不多的几次CR serve后就Downgrade了lock, 这造成对data block的 CR Request最终还是Receive了 Current Block。

MINSCN与Cache Fusion Read Consistent

问题来源于 网友在Ask Maclean Home 上关于  RAC 中 Past Image PI块的提问, 网友提出了一个演示,我们在11.2.0.3 2 Node RAC的环境中重现这个实验:

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> drop table test purge;

Table dropped.

SQL> alter system flush buffer_cache;

System altered.

SQL> create table test(id number);

insert into test values(1);

insert into test values(2);

commit;

/* 我们利用 rowid定位TEST表仅有的2行数据的数据块位置 */

select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from test;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                               89233                                    1
                               89233                                    1 

SQL> alter system flush buffer_cache;

System altered.

Instance 1  Session A 执行UPDATE操作:

SQL> update test set id=id+1 where id=1;

1 row updated.

Instance 1  Session B 查询x$BH buffer header视图 了解 相关Buffer的状态 

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1227595

 

X$BH 视图的 STATE字段代表Buffer的状态, 以下是状态列表:

 

STATE       NUMBER
           KCBBHFREE         0       buffer free
           KCBBHEXLCUR       1       buffer current (and if DFS locked X)
           KCBBHSHRCUR       2       buffer current (and if DFS locked S)
           KCBBHCR           3       buffer consistant read
           KCBBHREADING      4       Being read
           KCBBHMRECOVERY    5       media recovery (current & special)
           KCBBHIRECOVERY    6       Instance recovery (somewhat special)

 

 

这个演示中我们需要用到的是 : state =1 Xcurrent 、 state=2 Scurrent 、 state=3 CR

 

接着在 Instance 2  更新 同一个数据块内的另一条记录 ,这回引发 gc current block 2 way  并将Current Block 传输到 Instance 2, 同时 Instance 1 的原”Current Block” Convert 成 Past Image:

 

 

Instance 2 Session C 

SQL> update test set id=id+1 where id=2;

1 row updated.

Instance 2 Session D

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1227641
         3    1227638

 

STATE =1 的Xcurrent block已传输到 Instance 2 , 再来看 Instance 1 此时的 GC状态:

 

Instance 1 Session B 

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1227641
         3    1227638
         8          0
         3    1227595

 

 

问题出现在这里, 当网友再次在Instance 1上的session A中执行对TEST表的SELECT查询后 ,发现原来的 3个 State=3的CR 块 数量减少到了1个:

 

Instance 1 session A 即最初执行UPDATE的 session

SQL> alter session set events '10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high';

Session altered.

SQL> select * from test;

        ID
----------
         2
         2

select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

      STATE CR_SCN_BAS
---------- ----------
         3    1227716
         3    1227713
         8          0

 

 

网友在原帖中是通过v$BH 视图观察CR块的数量,发现在执行SELECT查询后 CR块数量反而减少了,故而产生了疑问。 我们在以上演示中直接 观察X$BH视图可以发现 , 原本的三个CR块的SCN Version分别为: 1227641、1227638、1227595, 在SELECT查询完成后被 2个不同SCN version的CR块 1227716和 1227713 所替换, Oracle为什么要这样做呢?

 

所幸我们在实际执行SELECT查询前设置了event 10708和 rac.*的诊断TRACE,我们先来看看TRACE内容:

 

PARSING IN CURSOR #140444679938584 len=337 dep=1 uid=0 oct=3 lid=0 tim=1335698913632292 hv=3345277572 ad='bc0e68c8' sqlid='baj7tjm3q9sn4'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("TEST") FULL("TEST") NO_PARALLEL_INDEX("TEST") */ 1 AS C1, 1 AS C2 FROM "SYS"."TEST" "TEST") SAMPLESUB
END OF STMT
PARSE #140444679938584:c=1000,e=27630,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1950795681,tim=1335698913632252
EXEC #140444679938584:c=0,e=44,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1950795681,tim=1335698913632390

*** 2012-04-29 07:28:33.632
kclscrs: req=0 block=1/89233

*** 2012-04-29 07:28:33.632
kclscrs: bid=1:3:1:0:7:80:1:0:4:0:0:0:1:2:4:1:26:0:0:0:70:1a:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:3f:0:1c:86:2d:4:0:0:0:0:a2:3c:7c:b:70:1a:0:0:0:0:1:0:7a:f8:76:1d:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:63:e5:0:0:0:0:0:0:10:0:0:0
2012-04-29 07:28:33.632578 : kjbcrc[0x15c91.1 76896.0][9]
2012-04-29 07:28:33.632616 : GSIPC:GMBQ: buff 0xba1e8f90, queue 0xbb79f278, pool 0x60013fa0, freeq 1, nxt 0xbb79f278, prv 0xbb79f278
2012-04-29 07:28:33.632634 : kjbmscrc(0x15c91.1)seq 0x2 reqid=0x1c(shadow 0xb4bb4458,reqid x1c)mas@2(infosz 200)(direct 1)
2012-04-29 07:28:33.632654 : kjbsentscn[0x0.12bbc1][to 2]
2012-04-29 07:28:33.632669 : GSIPC:SENDM: send msg 0xba1e9000 dest x20001 seq 24026 type 32 tkts xff0000 mlen x17001a0
2012-04-29 07:28:33.633385 : GSIPC:KSXPCB: msg 0xba1e9000 status 30, type 32, dest 2, rcvr 1

*** 2012-04-29 07:28:33.633
kclwcrs: wait=0 tm=689

*** 2012-04-29 07:28:33.633
kclwcrs: got 1 blocks from ksxprcv
WAIT #140444679938584: nam='gc cr block 2-way' ela= 689 p1=1 p2=89233 p3=1 obj#=76896 tim=1335698913633418
2012-04-29 07:28:33.633490 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-04-29 07:28:33.633510 : kjbrcvdscn[0x0.12bbc1][from 2][idx 2012-04-29 07:28:33.633527 : kjbrcvdscn[no bscn <= rscn 0x0.12bbc1][from 2]

*** 2012-04-29 07:28:33.633
kclwcrs: req=0 typ=cr(2) wtyp=2hop tm=689

 

 

通过TRACE不难发现 因为之前没有收集过TEST表的统计信息, 所以这里出发了Dynamic Sampling的动态采样,这本身会引发对TEST表的 CR读请求,实际产生了一次’gc cr block 2-way’ 等待:

2012-04-29 07:28:33.632654 : kjbsentscn[0x0.12bbc1][to 2]

12bbc1= 1227713  与上述X$BH中的一个CR块对应,kjbsentscn[0x0.12bbc1][to 2] 可以理解为 向 Instance 2 发送了SCN=12bbc1=1227713   DBA=0x15c91.1 76896.0 的  CR Request(obj#=76896)

之后kjbrcvdscn函数确认了 [no bscn <= rscn 0x0.12bbc1][from 2] ,即没有 比已receive的 SCN Version =12bbc1 更好的Best Version

 

CR Server Arch

CR Server in Oracle9i Release 2 There are three instances involved: the requestor instance, the lock master instance, and the current block owner instance. The lock is granted if one of the following is true: Resource held mode is NULL. Resource held mode is S and there is no holder of an S lock in the master node. Otherwise, the master forwards the CR request to the holder node. If the lock is global, then you choose a node to forward the CR request to as follows: If there is a past image (PI) at the lock master instance, and the PI SCN is greater than snap-scn, then the master node is this node. Otherwise, you choose a PI with the smallest SCN and PI-SCN greater than snap-SCN. The owner node of this PI is the node you forward the CR request to. The PI with smallest SCN is the most interesting one, because you have less UNDO to be applied. If there is no PI at all, you choose the node that the current buffer belongs to.

 

 

动态采样完成后才真正执行了用户发出的SELECT语句:

 

 

PARSING IN CURSOR #140444682869592 len=18 dep=0 uid=0 oct=3 lid=0 tim=1335698913635874 hv=1689401402 ad='b1a188f0' sqlid='c99yw1xkb4f1u'
select * from test
END OF STMT
PARSE #140444682869592:c=4999,e=34017,p=0,cr=7,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=1335698913635870
EXEC #140444682869592:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335698913635939
WAIT #140444682869592: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335698913636071

*** 2012-04-29 07:28:33.636
kclscrs: req=0 block=1/89233

*** 2012-04-29 07:28:33.636
kclscrs: bid=1:3:1:0:7:83:1:0:4:0:0:0:1:2:4:1:26:0:0:0:70:1a:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:2:0:1c:86:2d:4:0:0:0:0:a2:3c:7c:b:70:1a:0:0:0:0:1:0:7d:f8:76:1d:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:63:e5:0:0:0:0:0:0:10:0:0:0
2012-04-29 07:28:33.636209 : kjbcrc[0x15c91.1 76896.0][9]
2012-04-29 07:28:33.636228 : GSIPC:GMBQ: buff 0xba0e5d50, queue 0xbb79f278, pool 0x60013fa0, freeq 1, nxt 0xbb79f278, prv 0xbb79f278
2012-04-29 07:28:33.636244 : kjbmscrc(0x15c91.1)seq 0x3 reqid=0x1d(shadow 0xb4bb4458,reqid x1d)mas@2(infosz 200)(direct 1)
2012-04-29 07:28:33.636252 : kjbsentscn[0x0.12bbc4][to 2]
2012-04-29 07:28:33.636358 : GSIPC:SENDM: send msg 0xba0e5dc0 dest x20001 seq 24029 type 32 tkts xff0000 mlen x17001a0
2012-04-29 07:28:33.636861 : GSIPC:KSXPCB: msg 0xba0e5dc0 status 30, type 32, dest 2, rcvr 1

*** 2012-04-29 07:28:33.637
kclwcrs: wait=0 tm=865

*** 2012-04-29 07:28:33.637
kclwcrs: got 1 blocks from ksxprcv
WAIT #140444682869592: nam='gc cr block 2-way' ela= 865 p1=1 p2=89233 p3=1 obj#=76896 tim=1335698913637294
2012-04-29 07:28:33.637356 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-04-29 07:28:33.637374 : kjbrcvdscn[0x0.12bbc4][from 2][idx 2012-04-29 07:28:33.637389 : kjbrcvdscn[no bscn <= rscn 0x0.12bbc4][from 2]

*** 2012-04-29 07:28:33.637
kclwcrs: req=0 typ=cr(2) wtyp=2hop tm=865

 

类似的, “SELECT * FROM TEST”也引发了一次’gc cr block 2-way’等待:
2012-04-29 07:28:33.637374 : kjbrcvdscn[0x0.12bbc4][from 2][idx 2012-04-29 07:28:33.637389 : kjbrcvdscn[no bscn

最后Foreground Process从 Remote LMS哪里got的是 SCN=1227716 Version的CR, 同样与 之前我们从X$BH 视图查到的scn对应。

 

这样就可以解释为什么Instance 1上出现了2个SCN更大的CR块, 但仍无法解释原来存在于Instance 1 Buffer Cache中的 三个SCN Version 较小的CR 块消失的原因。

 

我们来看下面的演示:

 

 

 

SQL> alter system set "_enable_minscn_cr"=false scope=spfile;

System altered.

SQL> alter system set "_db_block_max_cr_dba"=20 scope=spfile;

System altered.

SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 1570009088 bytes
Fixed Size                  2228704 bytes
Variable Size             989859360 bytes
Database Buffers          570425344 bytes
Redo Buffers                7495680 bytes
Database mounted.
Database opened.

 

设置以上 “_enable_minscn_cr”=false 和 “_db_block_max_cr_dba”=20 并重启RAC所有实例, 重现上述演示:

 

 

在Instance 2 Session C 中update更新一次数据块 就对应地在Instance 1 中查询一次 ,以反复在Instance 1中Request CR 

SQL> update test set id=id+1 where id=2;              -- Instance 2

1 row updated.

SQL> select * From test;                         -- Instance 1

        ID
----------
         1
         2

下面为 Instance 1的 X$BH记录

select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

    STATE CR_SCN_BAS
---------- ----------
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         8          0

SQL>  update test set id=id+1 where id=3;

1 row updated.

SQL> select * From test;

        ID
----------
         1
         2

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1273091
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         8          0

...................

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1273793
         3    1273782
         3    1273780
         3    1273769
         3    1273734
         3    1273715
         3    1273691
         3    1273679
         3    1273670
         3    1273643
         3    1273635
         3    1273623
         3    1273106
         3    1273091
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         3    1273033

19 rows selected.

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1274993

 

 

如上述演示 在设置了 “_enable_minscn_cr”(enable/disable minscn optimization for CR)=false 和 “_db_block_max_cr_dba”=20 (Maximum Allowed Number of CR buffers per dba) 2个 参数后 最多的时候 Instance 1 中缓存了同一个数据块的 多达 19个版本的CR块。

 

“_enable_minscn_cr”是11g以后出现的新隐藏参数,它控制Oracle是否计算CR块的最小SCN,当Foreground Process Receive接收到同一个数据块的更新(SCN更大)的SCN Version CR Block时可能会清除CBC上的 SCN较小的、旧的CR块 , 这样做的目的是减少Buffer Cache中同一个数据块 不同版本SCN Version的CR块的数量,注意不管是 语句级别或者事务级别 其所要求的Snap_Scn 快照 SCN总是 语句或事务开始时的Current SCN, 保留一些旧的CR块虽然可能对一些持续时间长的查询或者游标有益, 但是实例Buffer Cache中 同一个数据块的多版本 CR块的总数量是有限的, 这个总数受到 “_db_block_max_cr_dba” 隐藏参数的控制, 如我们上述演示中设置为20 ,则最多可以在Buffer Cache中缓存多大19个版本的CR块; 注意该”_db_block_max_cr_dba” 参数的默认值为6 , 即一个实例Buffer cache中同一个数据块的CR 版本同时不多于6个。

 

引入”_enable_minscn_cr” 优化CR的最小MINSCN 是有其理由的, 即便那些版本较旧的CR块被新陈代谢了, 但只要需要 Foreground Process还是可以通过CR Request , 要求 Holder Instance LMS 去build一个 BEST CR 来用, 这不消我们去担心。

Script:收集RAC诊断信息

以下脚本可以用于汇总RAC诊断信息, 并非每一个输出或者每一个日志都要阅读,在问题诊断时可以有针对性地查看, 如果是远程协助的会,可以让用户收集完打包 ,这样省去多次交互的时间:

 

 

      - Cluster 启动状态
   - hostname
   - uptime 
   - vmstat iostat nmon OS stats 
   - /etc/sysctl.conf /etc/system  vmo -a no -a 
   - /etc/inittab
   - /etc/init.d/init.crs
   - /etc/init.d/init.crsd
   - /etc/init.d/init.cssd
   - /etc/init.d/init.evmd
   - olsnodes
   - crsctl check css
   - crsctl check crs status
   - crsctl check crsd status
   - crsctl check cssd status
   - crsctl check evmd status
   - crsctl query css votedisk
   - crsctl get css priority
   - crsctl get css diagwait
   - crsctl get css disktimeout
   - crsctl get css misscount
   - crsctl get css reboottime
   - crsctl query crs softwareversion
   - crsctl query crs activeversion
   - opatch lsinventory -detail 
   - /var/opt/oracle/ocr.loc 或 /etc/oracle/ocr.loc
   - cluvfy comp sys -n <hostname> -p crs -verbose
   - cluvfy comp sys -n <hostname> -p databaase -verbose
   - cluvfy stage -post crsinst -n <hostname> -verbose
   - cluvfy comp software -verbose -n all
   - Clusterware 版本
   - $ORA_CRS_HOME 已安装产品的信息
   - $ORA_CRS_HOME 已应用补丁的信息
   - crs_stat -u
   - crs_stat -p
   - $ORA_CRS_HOME/log/<hostname>/alert<hostname>.log
   - $ORA_CRS_HOME/crs/log/*.log
   - $ORA_CRS_HOME/log/<hostname>/crsd/*
   - $ORA_CRS_HOME/css/log/*.log
   - $ORA_CRS_HOME/css/init/*.log
   - $ORA_CRS_HOME/log/<hostname>/cssd/*
   - $ORA_CRS_HOME/racg/dump/*
   - $ORACLE_HOME/racg/dump/*
   - $ORA_CRS_HOME/log/<hostname>/racg/*
   - $ORACLE_HOME/log/<hostname>/racg/*
   - $ORA_CRS_HOME/evm/log/*.log
   - $ORA_CRS_HOME/evm/init/*.log
   - $ORA_CRS_HOME/log/<hostname>/evmd/*
   - $ORA_CRS_HOME/log/<hostname>/client/*
   - $ORACLE_HOME/log/<hostname>/client/*
   - $ORA_CRS_HOME/log/<hostname>/ohasd/ohasd.log
   - $ORA_CRS_HOME/log/<hostname>/gpnpd/gpnpd.log
   - /etc/oracle/oprocd/*oprocd.log 或 /var/opt/oracle/oprocd/*oprocd.log
   - /etc/oracle/lastgasp/*.lgl 或  /var/opt/oracle/lastgasp/*.lgl
   - debug resource
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:5
   - ocrdump
   - ocrcheck
   - ocrconfig -showbackup
   - /var/opt/oracle/srvConfig.loc 或 /etc/srvConfig.loc
   - srvctl config nodeapps -n <hostname> -a -l -g -s
   - srvctl config
   - srvctl config database -d <dbname> -a
   - srvctl config scan_listener
   - srvctl status scan_listener
   - srvctl config scan
   - srvctl status scan
   - srvctl config srvpool
   - srvctl status srvpool -a
   - srvctl config listener
   - srvctl status listener
   - srvctl config listener -n <hostname>
   - oifcfg iflist
   - oifcfg getif
   - interconnect 信息 x$skgxpia or oradebug ipc
   - IPC 信息
   - messages 或 syslog OS日志 
   IBM: /bin/errpt -a > messages.out
      Linux: /var/log/messages
      Solaris: /var/adm/messages

   - netstat -rn
   - ifconfig -a
   - cat /etc/hosts
   - netstat -s

Script:Collect vip resource Diagnostic Information

以下脚本可以用于收集 Oracle RAC中vip 资源或其他CRS resource的诊断信息:

 

action plan:

./runcluvfy.sh stage -post crsinst -n all  -verbose

./runcluvfy.sh stage -pre crsinst -n all  -verbose

or


cluvfy stage -post crsinst -n all -verbose

cluvfy stage -pre  crsinst -n all -verbose

1. Please upload the following logs of all two nodes:
$CRS_HOME/log/<nodename>/*.log
$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/cssd/*.log
$CRS_HOME/log/<nodename>/racg/*.log
$CRS_HOME/log/<nodename>/client/*.log
$CRS_HOME/log/<nodename>/evmd/*.log

/etc/oracle/oprocd/*.log.* or /var/opt/oracle/oprocd/*.log.* (If have)

$crs_stat –t
$crsctl check crs
$crsctl check boot

2. 

Please consult your sysadmin and make sure that the gateway is pingable all the time 

1- test the gw on every node
consult your sysadmin to create a crontab unix shell script to ping the
gateway of your public interface every 2 seconds for example and the result is to be
spooled in /tmp/test_gw_<nodename>.log

ping your gateway  and upload the ping log 

2- increase the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:5

3- restart the clusterware

3- execute this test on both nodes at the same time
   $ script /tmp/testvip_<nodename>.log
   $ cd $ORA_CRS_HOME/bin
   $ hostname
   $ date
   $ cat /etc/hosts
   $ ifconfig -a
   $ oifcfg getif  
   $ netstat -rn

   $ oifcfg iflist
   $ srvctl config nodeapps -n <nodename> -a -g -s -l               (repeate it for all nodes)

   $ crs_stat –t
   $ exit

4- reset the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:1

Up on the next occurence, please upload the following information from all nodes

  a-  /tmp/test_gw_<nodename>.log
  b- /tmp/testvip_<nodename>.log
  c- the crsd log
  d. The resource racg
     $ORA_CRS_HOME/log/<nodename>/racg/vip*

  e. the racgvip script from
     $ORA_CRS_HOME/bin/racgvip
  f- RDA from all the nodes
    Note 314422.1 Remote Diagnostic Agent (RDA) 4.0 – Overview

   g- the o/s message file
      IBM: /bin/errpt -a > messages.out
      Linux: /var/log/messages
      Solaris: /var/adm/messages

3. CRS Diagnostics

note 330358.1 -- CRS Diagnostic Collection Guide, please use (all .gz files especially crsData_$HOST.tar.gz
need to be uploaded)

diagcollection.pl --collect 

Please make sure to include *ALL* requested files (missing any will delay or prevendting from
identifying root cause) from *ALL* nodes in a single zip and upload.

Note 330358.1 - CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide       
Note 298895.1 - Modifying the default gateway address used by the Oracle 10g VIP
Note 399213.1 - VIP Going Offline Intermittantly - Slow Response from Default Gateway
Note 401783.1 - Changes in Oracle Clusterware after applying 10.2.0.3 Patchset

RAC CRS Resource资源的生命周期

CRS Resource的生命周期可以通过一系列的crs_* 资源管理命令来一步步诠释:

 

  • crs_profile : Create/Edit the resource’s attributes
  • crs_register : Add the resource
  • crs_start : start the resource
  • crs_stat : status of the resource
  • crs_relocate : move the resource
  • crs_stop : stop the resource
  • crs_unregister : Remove the resource

 

如何确认11.2 RAC Grid Infrastructure的时区

本文档描述如何在11.2 Grid Infrastructure 安装完成后修改其GI的时区time zone 设置。

 

一旦OS默认时区被修改,注意确保以下2点:

1. 对于11.2.0.1 ,确保root、grid、oracle用户的shell环境变量TZ设置正确!
2. 对于11.2.0.2及以上版本,确认 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 中的参数设置为正确的时区

 

例如:

 

ech $TZ
TZ=US/Pacific

grep TZ s_crsconfig__env.txt 
TZ=US/Pacific

 

 

若timezone设置时区不正确或存在无关字符可能导致RAC Grid Infrastructure无法正常启动。

确保以上2点保证GI能正常启动,这种因时区不正确导致的启动异常问题常发生在OS、GI已经安装完毕后而时区最后被修改的情景中,若发现OS时间与ohasd.log、ocssd.log等日志中的最新记录的时间不匹配,则往往是由该时区问题引起的。

 

在11.2 CRS之前可以通过init.cssd diag来确认时区设置。

 

 

以下为各OS上Timezone默认配置信息

 

 

Linux

To change: /usr/sbin/timeconfig

To display current setting:

cat /etc/sysconfig/clock
ZONE="America/Los_Angeles"
UTC=true
ARC=false
To find out all valid setting: ls -l /usr/share/zoneinfo

Anything that appears in this directory is valid to use, for example, CST6CDT and America/Chicago.

Note: the "Zone" field in /etc/sysconfig/clock could be different than what's in /usr/share/zoneinfo in OL6.3/RHEL6.3, the one from /usr/share/zoneinfo should be used in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt

hp-ux

To display current setting:

cat /etc/default/tz
PST8PDT
To change: set_parms timezone

To find out all valid setting: ls -l /usr/lib/tztab

Solaris

To display current setting:

grep TZ /etc/TIMEZONE
TZ=US/Pacific
To change, modify TIMEZONE, also run "rtc -z US/pacific; rtc -c"

To find out all valid settings: ls -l /usr/share/lib/zoneinfo

AIX

To display current setting:

grep TZ /etc/environment
TZ=GMT

11gR2 RAC vip和network资源的依赖性与频繁failover

在11gR2中出现了Oracle Network resource类型的CRS resource用以监控Public Network:

 

[root@vrh1 ~]# crsctl status resource ora.net1.network -f
NAME=ora.net1.network
TYPE=ora.network.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=1
CREATION_SEED=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=Oracle Network resource
ENABLED=1
ID=ora.net1.network
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=60
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=2.2
UPTIME_THRESHOLD=1d
USR_ORA_AUTO=static
USR_ORA_ENV=
USR_ORA_IF=eth0
USR_ORA_NETMASK=255.255.255.0
USR_ORA_SUBNET=192.168.1.0
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status resource ora.vrh1.vip -f        
NAME=ora.vrh1.vip
TYPE=ora.cluster_vip_net1.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=1
CHECK_TIMEOUT=30
CREATION_SEED=34700
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip)
DEGREE=1
DESCRIPTION=Oracle VIP resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
GEN_USR_ORA_STATIC_VIP=
GEN_USR_ORA_VIP=
HOSTING_MEMBERS=vrh1
ID=ora.vrh1.vip
LOAD=1
LOGGING_LEVEL=2
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=favored
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
TYPE_VERSION=2.1
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
USR_ORA_VIP=vrh1-vip
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status res ora.LISTENER.lsnr -f
NAME=ora.LISTENER.lsnr
TYPE=ora.listener.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=%CRS_HOME%/bin/racgwrap%CRS_SCRIPT_SUFFIX%
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME_LOWER_CASE%.LISTENER_%CRS_CSS_NODENAME_UPPER_CASE%.lsnr
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
CREATION_SEED=35089
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=Oracle Listener resource
ENABLED=1
ENDPOINTS=TCP:1521
ID=ora.LISTENER.lsnr
LOAD=1
LOGGING_LEVEL=5
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1521
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type)
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:type:ora.cluster_vip_net1.type)
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=ORACLE_BASE=/g01/orabase
USR_ORA_OPI=false
VERSION=11.2.0.3.0

 

 

可以看到在11.2 RAC中Listener资源依赖于VIP, 而VIP资源依赖于ora.net1.network;这就造成了当public network短时不可用(或曰network hiccup)时会造成ora.net1.network资源OFFLINE,这就将造成该节点上VIP资源的FAILOVER和LISTENER的OFFLINE。

且由于在11.2上ora.net1.network资源的默认CHECK_INTERVAL=1,即每秒都会对该NETWORK资源进行监控,所以NETWORK资源变得十分敏感,不管是由于硬件网络亦或者较高的主机负载造成短时的Public Network不可用,都可能导致VIP和LISTENER由于NETWORK依赖资源OFFLINE而受到影响。

 

我们来实际体会一下:

 

 

 

[root@vrh1 orarootagent_root]# pwd
/g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root

2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.597: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.602: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:17.594: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

[root@vrh1 orarootagent_root]# lsof |grep orarootagent_root.log
orarootag  3835      root    4w      REG              253,0   9109896              4717997 /g01/11.2.0/grid/log/vrh1/agent/ohasd/orarootagent_root/orarootagent_root.log
orarootag 23559      root    4w      REG              253,0   3166410              4723518 /g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root/orarootagent_root.log

[root@vrh1 orarootagent_root]# ps -ef|egrep "3835|23559"|grep -v grep
root      3835     1  0 Oct23 ?        06:57:18 /g01/11.2.0/grid/bin/orarootagent.bin
root     23559     1  0 Nov20 ?        04:06:24 /g01/11.2.0/grid/bin/orarootagent.bin

 

 

 

 

可以看到如上orarootagent.bin进程每秒都会对VIP的依赖资源ora.net1.network做心跳检测,我们手动在3s内关闭和重启NETWORK资源,并观测结果:

 

 

 

 

2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} ora.net1.network vrh1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Switching online monitor to offline one
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047944
2012-12-21 02:09:05.781: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776] {1:62611:2} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.790: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:05.791: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.vrh1.vip 1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Preparing START command for: ora.net1.network vrh1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.scan1.vip 1 1
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init enter {
2012-12-21 02:09:05.878: [ora.scan1.vip][2281658688] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} ora.scan1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047956
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending last reply for: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:06.056: [ora.net1.network][1266022720] {0:1:53734} [start] Checking if eth0 Interface is fine
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic {
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic }
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic {
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic }
2012-12-21 02:09:06.347: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start {
2012-12-21 02:09:06.348: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start }
2012-12-21 02:09:06.348: [    AGFW][1400256832] {1:62611:2} ora.vrh1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:06.348: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047960
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.350: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:06.395: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.445: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv {
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv }
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} Thread:[SendFail2SrvThread:]isRunning is reset to false here
2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init exit }
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop {
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop }
2012-12-21 02:09:06.725: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.726: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:09:06.836: [    AGFW][1266022720] {0:1:53734} Command: start for resource: ora.net1.network vrh1 1 completed with status: SUCCESS
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:09:06.836: [    AGFW][1400256832] {0:1:53734} Agent sending reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:09:06.837: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetInterface::scheckNetInterface returned 0 
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::checkInterface returned false
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: STARTING to: OFFLINE
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Switching online monitor to offline one
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237

 

 

可以看到如上在Public Network DOWN之后1s内orarootagent进程即检测到并试图重启ora.net1.network,重启失败后将VIP置为OFFLINE状态,随后LISTENER也被OFFLINE,之后当Public Network恢复后,network、vip、listener相继自动恢复:

 

 

2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] ifname=eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetmask=255.255.255.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetnumber=192.168.1.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Interface Name = eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Primary IP = 192.168.1.161
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start {
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start }
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: ONLINE
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Switching offline monitor to online one
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=1000 delay=1000
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047975
2012-12-21 02:10:07.244: [ USRTHRD][1266022720] {0:1:53734} Relocating Node Vip Resource
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic {
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic }
2012-12-21 02:10:07.270: [    GIPC][1266022720] gipcCheckInitialization: possible incompatible non-threaded init from [clsss.c : 5326], original from [clsnOraRootAgent.
cpp : 208]
2012-12-21 02:10:07.278: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic {
2012-12-21 02:10:07.279: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic }
2012-12-21 02:10:07.310: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]isRunning is reset to false here
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Agent received the message: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Preparing START command for: ora.vrh1.vip 1 1
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop }
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop }
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkMask = 255.255.255.0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkAuto = static
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Checking if eth0 Interface is fine
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] ifname=eth0

2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetmask=255.255.255.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetnumber=192.168.1.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] HostName vrh1-vip translated to 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Interface Name = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ip Address = 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Utils::getCrsHome crsHome /g01/11.2.0/grid
2012-12-21 02:10:07.351: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ocr Context init default level -1677304432
2012-12-21 02:10:07.526: [ default][1333139776]ICMP Ping from 192.168.1.161 to 192.168.1.162
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Trying to plumb  192.168.1.162 on eth0
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp {
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Adding 192.168.1.162 on eth0:1
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp }
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Flushing neighbours ARP Cache
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sSend:  sending type 1
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.533: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:10:08.533: [    AGFW][1333139776] {1:62611:962} Command: start for resource: ora.vrh1.vip 1 1 completed with status: SUCCESS
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [    AGFW][1400256832] {1:62611:962} Agent sending reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: STARTING to: ONLINE
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Started implicit monitor for [ora.vrh1.vip 1 1] interval=1000 delay=1000
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start {
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start }
2012-12-21 02:10:09.530: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv {
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv }
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} Thread:[StartorphanSrvThread:]isRunning is reset to false here
2012-12-21 02:10:10.532: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

 

 

解决由于Network资源过于敏感导致的不必要的vip和listener的方法:

打补丁12378938 该补丁被包含在” 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above”;并修改vip资源的依赖属性,例如:

 

 

# $GRID_HOME/bin/crsctl modify res ora.scan.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net.network)"
For example:

# /ocw/grid/bin/crsctl modify res ora.scan1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan2.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan3.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/srvctl stop scan -f
$ /ocw/grid/bin/srvctl start scan_listener

 

 

 

MORE INFO:

 

 

After check timed out, 11gR2 Grid Infrastructure network resource (usually ora.net1.network) goes to INTERMEDIATE state, then goes back to ONLINE very shortly. This note will not discuss cause of check time out, but most common cause is public network hiccup.

Once network resource goes into INTERMEDIATE state, it may trigger VIP/service to be failed over and listener to go offline due to resource dependence, which could result in unnecessary connectivity issue for that period of time. After network resource is back online, affected resources (listener, service or instance) may not come back online.

$GRID_HOME/log//crsd/crsd.log
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state change for ora.net1.network racnode1 1 [old state = ONLINE, new state = UNKNOWN]
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state LABEL change for ora.net1.network racnode1 1 [old label = , new label = CHECK TIMED OUT]
..
2011-06-12 07:12:31.297: [ CRSPE][12081] {0:1:2881} RI [ora.net1.network racnode1 1] new external state [INTERMEDIATE] old value: [ONLINE] on racnode1 label = [CHECK TIMED OUT]
..
2011-06-12 07:12:31.981: [ AGFW][10796] {0:1:2882} Received state change for ora.net1.network racnode1 1 [old state = UNKNOWN, new state = ONLINE]
..
2011-06-12 07:12:32.307: [ CRSPE][12081] {0:1:2881} RI [ora.LISTENER.lsnr racnode1 1] new internal state: [STOPPING] old value: [STABLE]
2011-06-12 07:12:32.308: [ CRSPE][12081] {0:1:2881} CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr’ on ‘racnode1’
$GRID_HOME/log//agent/crsd/orarootagent_root/orarootagent_root.log
2011-06-12 07:12:08.965: [ AGFW][2070] {1:27767:2} Created alert : (:CRSAGF00113:) : Aborting the command: check for resource: ora.net1.network racnode1 1
2011-06-12 07:12:08.966: [ora.net1.network][2070] {1:27767:2} [check] clsn_agent::abort {
..
2011-06-12 07:12:31.257: [ AGFW][2070] {1:27767:2} Command: check for resource: ora.net1.network racnode1 1 completed with status: TIMEDOUT
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: ONLINE to: UNKNOWN
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 would be continued to monitored!
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state details has changed from: to: CHECK TIMED OUT
..
2011-06-12 07:12:31.923: [ora.net1.network][2314][F-ALGO] {1:27767:2} CHECK initiated by timer for: ora.net1.network racnode1 1
..
2011-06-12 07:12:31.973: [ora.net1.network][8502][F-ALGO] {1:27767:2} [check] Command check for resource: ora.net1.network racnode1 1 completed with status ONLINE
2011-06-12 07:12:31.978: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: UNKNOWN to: ONLINE
$GRID_HOME/log//agent/crsd/oraagent_/oraagent_.log
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Agent received the message: RESOURCE_STOP[ora.LISTENER.lsnr racnode1 1] ID 4099:14792
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Preparing STOP command for: ora.LISTENER.lsnr racnode1 1
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} ora.LISTENER.lsnr racnode1 1 state changed from: ONLINE to: STOPPING

Cause

This is caused by bug 12680491

@ Related issues bug 12378938 bug 11857122 bug 12428737

Solution

The fix of bug 12680491 will add intermediate modifyer to stop dependency between network resource and VIP to avoid unnecessary resource state change, it’s included in 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above. This fix is recommended instead of fix for bug 12378938 to avoid the issue in first place.

Once patch for this bug is applied, the following needs to be executed to change the dependence for all VIPs:

# $GRID_HOME/bin/crsctl modify res ora..vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora..network)”

For example:

# /ocw/grid/bin/crsctl modify res ora.racnode1.vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)”
Once the attribute is changed, a restart of nodeapps/VIP is needed to be in effect

 

Know more about RAC GES STATISTICS

GES 全称为Global Enqueue Services是RAC中重要的全局队列锁服务,V$GES_STATISTICS 动态性能视图汇聚了主要的GES STATISTICS 统计信息。为了更好地理解GES STATISTICS ,我们可以通过下表理解这些统计项的含义:

 

V$GES_STATISTICS Reference (10.2.0.1.0)

 

0 messages sent directly         

 

Incremented when any process successfully sends a message to a remote instance without being blocked and without flow control.

 

1 messages flow controlled                 

 

Incremented when any process could not send a message directly because there were not enough tickets available.

 

2 messages sent indirectly

 

Incremented when any process is asked to flow-control the message (i.e. a process tried to send a message indirectly, even if a ticket was available). This can also be incremented when previous message to the same target node had failed or a GCS/GES operation is being frozen.

 

3 messages received logical

 

When LMS receives a GCS/GES message from remote or local client, this statistic is incremented.

 

61 messages received actual

 

When LMS receives a message from a remote instance, this is incremented. A single actual message can contain multiple logical messages. Note, that when remote messages are queued, because they are flow controlled or they are indirectly / intentionally queued, the LMS process tries to send them in batch instead of flushing them individually.

 

4 flow control messages sent

5 flow control messages received

 

Messages flow controlled due to lack of ticket.

 

6 gcs msgs received

7 gcs msgs process time(ms)

8 ges msgs received

9 ges process time(ms)

 

When LMS receives a message, and if the message is related to either GCS (Global Cache Service) or GES (Global Enqueue Service) activity, it is incremented. After a GCS/GES message is processed, the process (typically LMD or LMS) updates the following statistics.

 

  • gcs msgs received
  • gcs msgs process time(ms)
  • ges msgs received
  • ges msgs process time(ms)

 


10 msgs causing lmd to send msgs 

11 lmd msg send time(ms)

65 msgs causing lms to send msgs 

66 lms msg send time(ms)

 

Incremented when the LMD/LMS processes a GCS/GES message and it causes LMD/LMS to send one or more messages. For example, if LMS receives a message, and as part of processing it sends four more messages, the statistic will be incremented by 1, not 4. In order to get the exact number of messages sent by LMS, the session statistic ‘gcs messages sent’ and ‘ges messages sent’ needs to be retrieved for the session running LMS (from V$SESSTAT).

 

12 gcs side channel msgs actual

13 gcs side channel msgs logical

 

‘side channel msgs logical’ indicates the number of blocks shipped from this node to other nodes. ‘side channel msgs actual’ indicates the actual number of messages sent to other nodes. When CR blocks or current blocks are sent to a remote node, the sender actually sends another reliable message to the requestor, because the CR block or current block being shipped could be lost. For example, a node sends 100 CR blocks to another node (logical number of messages). The sender node then may send a message saying ‘I’ve sent 100 blocks’ in a single message (actual number of messages). The init.ora parameter ‘_side_channel_batch_size’ defines the number of side channel messages to be sent in a single message.  With reliable IPC technology such as RSM and HyperFabric, we do not need side channel messages, and this value should be 0. With non-reliable IPC technology like UDP, these should be increased.

 

14 gcs pings refused

 

Incremented when the master node sends a BAST to a holder node, and the holder node is not able to service the BAST for some reason (typically because the block is not present or the ping queue is full).

 

15 gcs writes refused

 

Same as above, except that this is for Writes. In RAC if the blocks are globally dirty the writes are mediated by the GCS.

 

16 gcs error msgs

 

Certain race conditions in the GCS this statistic to be updated. It usually involves sending some extra messages to resolve the race through the use of error messages.

 

17 gcs out-of-order msgs

 

With direct sends, it is possible for two messages, which are sent from the same instance, to be received out-of-order at the master node. This statistic is updated whenever that happens.

 

18 gcs immediate (null) converts

 

Incremented when NULL lock can be granted immediately

 

19 gcs immediate cr (null) converts

 

Incremented when NULL lock for CR request can be granted immediately

 

20 gcs immediate (compatible) converts

 

Incremented when shared lock can be granted immediately

 

21 gcs immediate cr (compatible) converts

 

Incremented when shared lock for CR request can be granted immediately

 

22 gcs blocked converts

 

Incremented when the lock cannot be granted immediately. The lock is on the head of the convert queue.

 

23 gcs queued converts

 

Incremented when the lock cannot be granted immediately, and there is a conflicting lock in the convert queue ahead of this lock.

 

24 gcs blocked cr converts

 

Incremented when a CR request cannot be granted a lock because the lock is already being converted or the lock is in exclusive mode

 

25 gcs compatible basts

 

Number of BAST’s sent to holder node of a compatible lock.

 

26 gcs compatible cr basts (local)

 

CR request can be granted a lock, and BAST is sent to holder node. The lock is in local role.

 

60 gcs compatible cr basts (global)

 

This is incremented when the lock request is compatible but we can’t read from
disk because the block is globally dirty.

 

27 gcs cr basts to PIs

 

CR request is sent to an instance that has a PI buffer that satisfies this CR request.

 

28 dynamically allocated gcs resources

29 dynamically allocated gcs shadows

 

Number of gcs resources / shadows dynamically allocated after the startup of instance. We should not see these increasing at all.  _gcs_resources and _gcs_shadows could be used to change the default number of these resources to avoid dynamic allocation, but we should treat it as a bug (the default should be enough or it could be memory leak.).

 

30 gcs recovery claim msgs

 

Number of recovery claim messages processed by this instance.

 

31 gcs indirect ast

 

AST is sent to LMS instead of foreground process.

 


32 gcs dbwr write request msgs

33 gcs dbwr flush pi msgs

34 gcs lms write request msgs

35 gcs lms flush pi msgs

36 gcs write notification msgs

 

Messages related to flushing dirty XCUR / PI buffers. To flush PI buffers, request master node to write the most recent copy of the block in the global cache, which is ‘write request msgs’. Once the most recent copy of the block in the global cache is written to disk, PI buffers in the global cache can be purged, which is ‘flush pi msgs’. Once the most recent copy is written to disk, ‘write notification’ message is sent to the master node.

 

37 gcs retry convert request

 

Convert request had to be retried due to some race conditions.

 

38 gcs regular cr

 

CR for data blocks

 

39 gcs undo cr

 

CR for undo blocks

 

40 gcs assume no cvt

 

Assume was processed when the convert q is empty.

 

41 gcs assume cvt

 

Assume was processed when the convert q is non-empty.

 

42 broadcast msgs on commit(actual)

 

MCPD=0, number of messages sent to update the SCN.

 

43 broadcast msgs on commit(logical)

 

Same as 42, but logical (because the update may have been piggybacked).

 

44 broadcast msgs on commit(wasted)

 

Update SCN mesage is sent, but it is potentially a waste because receiver may have already updated the SCN.

 

45 acks for commit broadcast(actual)

46 acks for commit broadcast(logical)

 

Same as 42, 43 except that it applies to the receiving instance.

 

47 false posts waiting for scn acks

 

We posted LGWR because we thought MCPD broadcast completed, but it didn’t.

 

48 gcs forward cr to pinged instance

 

CR request is forwarded to the instance that is currently converting the GCS resource

 

49 gcs cr serve without current lock

 

CR block is served by the instance that does not have the current lock.

 

50 msgs sent queued

51 msgs sent queue time (ms)

 

Number of logical messages sent through send queue and their queuing time. Queuing time for queued messages: ‘msgs sent queue time (ms)’ / ‘msgs sent queued’  à Average message queuing time for flow controlled or indirectly sent messages. Note: this is calculated at ‘kjct’ layer (GCS/GES communication layer).

 

52 msgs sent queued on ksxp

53 msgs sent queue time on ksxp (ms)

 

Number of messages queued, and queuing time on ksxp layer. This includes all types of GCS/GES messages sent by any Oracle processes (foreground and background processes). Note: ‘msgs sent queued’ is a statistic from the kjct layer where we know if the messages are directly sent or indirectly sent.

 

54 msgs received queue time (ms)

55 msgs received queued

 

Elapsed time that a message is actually received until it is processed. Number of messages received  (logical). The ratio ‘msgs received queue time (ms)’ / ‘msgs received queued’ gives us the average queuing time between message arrival and start processing.

 

56 implicit batch messages sent

57 implicit batch messages received

 

Number of messages sent/received that are batched implicitly. Note: messages that are queued because of flow control or because of indirect messages, can be batched.

 

58 gcs refuse xid

 

Number of lock request received but refused to process by this instance, because index split is in progress (new in Oracle9i Release 2)

 

59 gcs ast xid

 

Number of lock request canceled because of index split

 

62 process batch messages sent

63 process batch messages received

 

Number of messages sent/received in batch.  When LMS receives multiple remote messages at a time, it processes all of them first, and then sends messages in batch as a result, instead of sending the result for each request individually.

 

64 messages sent pbatched

 

This is the number of messages being sent through process batching. This is the logical number whereas “process batch messages sent” is the actual number of messages sent out. Process batching in 10g is used for multi-block read, newing, receiver direct send (LMD0, LMS*, LMON) and fusion write (DBW*).

 


67 global posts requested

 

AQ requested that a global post be delivered to another instance

 

68 global posts dropped

 

Post was dropped because there was no buffer space.

 

69 global posts queued

 

A post was queued to be sent to another instance

 

70 global posts sent

 

A post was actually sent to another instance

 

71 global posts queue time

 

Time difference between enqueuing and sending the post.

 

72 messages sent not implicit batched

 

This is the number of indirect sent messages not get any batching done from the send proxies due to various reason. For example, the message is big enough or is defined as non-batch type.

 

73 messages queue sent actual

 

Actual number of messages sent indirectly by send proxies.

 

74 messages queue sent logical

 

Logical number of messages sent indirectly by send proxies including the number of embedded message batched either through process batching or batching done in send proxies.

 

实际V$GES_STATISTICS的信息来源于X$KJISFT内部视图

Know more about RAC statistics and wait event

以下列出了RAC中的主要统计信息和等待事件:

 

1. Statistics:

1.1 V$SYSSTAT, V$SESSTAT (join to V$STATNAME)

  • gc cr blocks served
  • gc cr block build time
  • gc cr block flush time
  • gc cr block send time
  • gc current blocks served
  • gc current block pin time
  • gc current block flush time
  • gc current block send time
  • gc cr blocks received
  • gc cr block receive time
  • gc current blocks received
  • gc current block receive time
  • gc blocks lost
  • gc claim blocks lost
  • gc blocks corrupt
  • global enqueue gets sync
  • global enqueue gets async
  • global enqueue get time
  • global enqueue releases
  • gcs messages sent
  • ges messages sent
  • global enqueue CPU used by this session
  • gc CPU used by this session
  • IPC CPU used by this session
  • global undo segment hints helped (10.2)
  • global undo segment hints were stale (10.2)

1.2 V$SYSMETRIC

  • GC CR Block Received Per Second
  • GC CR Block Received Per Txn
  • GC Current Block Received Per Second
  • GC Current Block Received Per Txn
  • Global Cache Average CR Get Time
  • Global Cache Average Current Get Time
  • Global Cache Blocks Corrupted
  • Global Cache Blocks Lost

1.3 V$SEGMENT_STATISTICS, V$SEGSTAT

  • gc buffer busy
  • gc cr blocks received
  • gc current blocks received

1.4 DBA_HIST_SEG_STAT

  • GC_CR_BLOCKS_SERVED_TOTAL
  • GC_CR_BLOCKS_SERVED_DELTA
  • GC_CU_BLOCKS_SERVED_TOTAL
  • GC_CU_BLOCKS_SERVED_DELTA

1.5 V$GES_STATISTICS

  • acks for commit broadcast(actual)
  • acks for commit broadcast(logical)
  • broadcast msgs on commit(actual)
  • broadcast msgs on commit(logical)
  • broadcast msgs on commit(wasted)
  • dynamically allocated gcs resources
  • dynamically allocated gcs shadows
  • false posts waiting for scn acks
  • flow control messages received
  • flow control messages sent
  • gcs assume cvt
  • gcs assume no cvt
  • gcs ast xid
  • gcs blocked converts
  • gcs blocked cr converts
  • gcs compatible basts
  • gcs compatible cr basts (global)
  • gcs compatible cr basts (local)
  • gcs cr basts to PIs
  • gcs cr serve without current lock
  • gcs dbwr flush pi msgs
  • gcs dbwr write request msgs
  • gcs error msgs
  • gcs forward cr to pinged instance
  • gcs immediate (compatible) converts
  • gcs immediate (null) converts
  • gcs immediate cr (compatible) converts
  • gcs immediate cr (null) converts
  • gcs indirect ast
  • gcs lms flush pi msgs
  • gcs lms write request msgs
  • gcs msgs process time(ms)
  • gcs msgs received
  • gcs out-of-order msgs
  • gcs pings refused
  • gcs queued converts
  • gcs recovery claim msgs
  • gcs refuse xid
  • gcs regular cr
  • gcs retry convert request
  • gcs side channel msgs actual
  • gcs side channel msgs logical
  • gcs undo cr
  • gcs write notification msgs
  • gcs writes refused
  • ges msgs process time(ms)
  • ges msgs received
  • global posts dropped
  • global posts queue time
  • global posts queued
  • global posts requested
  • global posts sent
  • implicit batch messages received
  • implicit batch messages sent
  • lmd msg send time(ms)
  • lms(s) msg send time(ms)
  • messages flow controlled
  • messages queue sent actual
  • messages queue sent logical
  • messages received actual
  • messages received logical
  • messages sent directly
  • messages sent indirectly
  • messages sent not implicit batched
  • messages sent pbatched
  • msgs causing lmd to send msgs
  • msgs causing lms(s) to send msgs
  • msgs received queue time (ms)
  • msgs received queued
  • msgs sent queue time (ms)
  • msgs sent queue time on ksxp (ms)
  • msgs sent queued
  • msgs sent queued on ksxp
  • process batch messages received
  • process batch messages sent

2. Wait Events:

在10g中RAC等待事件可以分为3类,下面列出了主要的等待事件包括一些undocumented wait event.

2.1. Real time only :

Those wait events are only defined while the process is waiting; after the wait is over, they are reclassified according to the outcome of the global cache operation. They should appear only on the following views: V$SESSION_WAIT, V$ACTIVE_SESSION_HISTORY, V$EVENT_HISTOGRAM.

 

  • gc cr request
  • gc current request

2.2. Historical only:

These are events represent the outcome of a GC request (fixup events). They can appear in the following views: V$SESSION_EVENT, V$SYSTEM_EVENT, DBA_HIST_SYSTEM_EVENT, V$EVENTMETRIC.

  • gc cr block 2-way
  • gc cr block 3-way
  • gc cr block busy
  • gc cr block congested
  • gc cr block lost
  • gc cr block unknown
  • gc cr grant 2-way
  • gc cr grant busy
  • gc cr grant congested
  • gc cr grant unknown
  • gc current block 2-way
  • gc current block 3-way
  • gc current block busy
  • gc current block congested
  • gc current block lost
  • gc current block unknown
  • gc current grant 2-way
  • gc current grant busy
  • gc current grant congested
  • gc current grant unknown

2.3. Other events:

The remaining wait events may appear in any of the views listed before, namely: V$SESSION_WAIT, V$ACTIVE_SESSION_HISTORY, V$EVENT_HISTOGRAM, V$SESSION_EVENT, V$SYSTEM_EVENT, DBA_HIST_SYSTEM_EVENT, V$EVENTMETRIC.

  • LMON global data update
  • cr request retry
  • gc assume
  • gc block recovery request
  • gc buffer busy
  • gc claim
  • gc cr cancel
  • gc cr disk read
  • gc cr disk request
  • gc cr failure
  • gc cr multi block request
  • gc current cancel
  • gc current multi block request
  • gc current retry
  • gc current split
  • gc domain validation
  • gc freelist
  • gc prepare
  • gc quiesce wait
  • gc recovery free
  • gc recovery quiesce
  • gc remaster
  • gcs ddet enter server mode
  • gcs domain validation
  • gcs drm freeze begin
  • gcs drm freeze in enter server mode
  • gcs enter server mode
  • gcs log flush sync
  • gcs remastering wait for read latch
  • gcs remastering wait for write latch
  • gcs remote message
  • gcs resource directory to be unfrozen
  • gcs to be enabled
  • ges LMD suspend for testing event
  • ges LMD to inherit communication channels
  • ges LMD to shutdown
  • ges LMON for send queues
  • ges LMON to get to FTDONE
  • ges LMON to join CGS group
  • ges cached resource cleanup
  • ges cancel
  • ges cgs registration
  • ges enter server mode
  • ges generic event
  • ges global resource directory to be frozen
  • ges inquiry response
  • ges lmd and pmon to attach
  • ges lmd/lmses to freeze in rcfg – mrcvr
  • ges lmd/lmses to unfreeze in rcfg – mrcvr
  • ges master to get established for SCN op
  • ges performance test completion
  • ges pmon to exit
  • ges process with outstanding i/o
  • ges reconfiguration to start
  • ges remote message
  • ges resource cleanout during enqueue open
  • ges resource cleanout during enqueue open-cvt
  • ges resource directory to be unfrozen
  • ges retry query node
  • ges reusing os pid
  • ges user error
  • ges wait for lmon to be ready
  • ges1 LMON to wake up LMD – mrcvr
  • ges2 LMON to wake up LMD – mrcvr
  • ges2 LMON to wake up lms – mrcvr 2
  • ges2 LMON to wake up lms – mrcvr 3
  • ges2 proc latch in rm latch get 1
  • ges2 proc latch in rm latch get 2
  • global cache busy
  • global enqueue expand wait
  • latch: KCL gc element parent latch
  • latch: gcs resource hash
  • latch: ges resource hash list

沪ICP备14014813号-2

沪公网安备 31010802001379号