ora-7445 [kghalp+0500] [SIGSEGV]错误

今天没有外出(似乎人不到现场就特别容易出问题),早上10点左右接到电话被告知crm11实例上出现了7445错误,准备用web vpn拨上去查看一下,赫然发觉windows 7 不支持这种vpn(准确说ie8和firefox都不支持);无奈无奈只好用拨号。
发现alert log中出现大量 7445错误记录:

Fri Mar 26 09:24:53 2010
Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_6754320.trc:
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []
Fri Mar 26 09:24:55 2010
Trace dumping is performing id=[cdmp_20100326092455]
Fri Mar 26 09:31:16 2010
Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_2994552.trc:
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []

看到kghalp函数第一印象 ,是Oracle中堆管理使用的函数;
让我们猜猜字面意思? k -> kernel g -> generic h-> heap a-> allocation p-> point
再让我们来看一下当时的call stack:

Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped object), addr: 0x3b, PC: [0x1000973e0, kghalp+0500]
Registers:
iar: 00000001000973e0, msr: a00000000000d0b2
 lr: 00000001013a6df8,  cr: 0000000022292484
r00: 0000000000000010, r01: 0ffffffffffcb160, r02: 000000011022a9c0,
r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100,
r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000,
r09: 0000000000000000, r10: 00000000101b60d8, r11: 0000000000000004,
r12: 0000000024592484, r13: 000000011026bfe0, r14: 0000000000000000,
r15: 0000000000009000, r16: 0000000110195b2c, r17: 0000000000000000,
r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000,
r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001,
r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001,
r27: 0000000104c7fd44, r28: 0000000000000000, r29: 0000000000000100,
r30: 0000000000000000, r31: 0000000110195a58,
*** 2010-03-26 09:57:28.679
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []
Current SQL statement for this session:
INSERT INTO AUDIT_DDL_LOG (DDL_TIME, SESSION_ID, OS_USER, IP_ADDRESS, TERMINAL, HOST, USER_NAME, DDL_TYPE, OBJECT_TYPE, OWNER, OBJECT_NAME, SQL_TEXT) VALUES (SYSDATE, SYS_CONTEXT('USERENV','SESSIONID'), SYS_CONTEXT('USERENV','OS_USER'), SYS_CONTEXT('USERENV','IP_ADDRESS'), SYS_CONTEXT('USERENV','TERMINAL'), SYS_CONTEXT('USERENV','HOST'), ORA_LOGIN_USER, ORA_SYSEVENT, ORA_DICT_OBJ_TYPE, ORA_DICT_OBJ_OWNER, ORA_DICT_OBJ_NAME, :B1 )
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
70000043da500d0        10  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 104A54EED ?
ksedmp+0290          bl       ksedst               104A54870 ?
ssexhd+03e0          bl       ksedmp               300001D15 ?
000044C0             ?        00000000
parchk+01f4          bl       kghalp               000000000 ?
                                                   2842288200000001 ?
                                                   000000000 ? 000000000 ?
                                                   000001040 ? 110195B2C ?
ptmak+0168           bl       parchk               FFFFFFFFFFCB560 ?
                                                   FFFFFFFFFFCB430 ?
                                                   FFFFFFFFFFCB430 ?
pdybF00_Init+0244    bl       ptmak                10008049C ? 000000000 ?
                                                   FFFFFFFFFFCB4F0 ? 07FFFFFFF ?
pdy1F79_Init+00c8    bl       pdybF00_Init         110BEB1D0 ?
pdy1F01_Driver+0048  bl       pdy1F79_Init         FFFFFFFFFFCBC40 ?
pdli_new_cog+00f0    bl       pdy1F01_Driver       FFFFFFFFFFCBCE0 ? 000000000 ?
pdlifu+0264          bl       pdli_new_cog         1013885F4 ? FFFFFFFFFFCCB00 ?
                                                   7000004383E7680 ?
phpcog+0010          bl       pdlifu               FFFFFFFFFFCD958 ?
                                                   7000004383E7680 ? 104C95048 ?
phpcmp+0f80          bl       phpcog               FFFFFFFFFFCC4F0 ? 000000000 ?
pcicms2+02d4         bl       phpcmp               FFFFFFFFFFCD958 ?


发生错误的最上层 kghalp 函数由 parchk 调用, 这似乎是一个package check函数(猜测,呵呵). 我们来整理一下思路, parchk 函数调用了 kghalp函数以帮其分配内存,但却得到了一个非法的低地址[[0x00000003B],正常情况下正文段使用的空间; 这看起来显然是一个bug。
让我们来查查support.oracle.com , 键入7445 kghalp 和sigsegv 关键字 (很多时候不需要使用ora 600/7445 lookup tools).
bug 8244533 赫然显目:

Bug 8244533: ORA-07445 [KGHALP] ERRORS COMPILING PACKAGE WITH DEBUG
    STACK TRACE:
    ------------
       ksedst <- ksedmp <- ssexhd <- 000044BC <- parchk        <- ptmak <-
    pdybF00_Init <- pdy1F79_Init <- pdy1F01_Driver <- pdli_new_cog         <-
    pdlifu <- phpcog <- phpcmp <- pcicms2 <- pcicms          <- kkxcms <- kkxswcm
    <- kkxmpbms <- kkxmesu <- xtypls           <- qctopls <- qctcopn <- qctcopn

    Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped
    object),
    addr: 0x3b, PC: [0x1000973e0, kghalp+0500]
    Registers:
    iar: 00000001000973e0, msr: a00000000000d0b2
    lr: 000000010139ffb8,  cr: 00000000222a2484
    r00: 0000000000000010, r01: 0ffffffffffe2980, r02: 00000001101e5ab8,
    r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100,
    r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000,
    r09: 0000000000000000, r10: 0000000010171200, r11: 0000000000000004,
    r12: 00000000245a2484, r13: 000000011021fbc0, r14: 0000000000000000,
    r15: 0000000000009000, r16: 0000000110150c54, r17: 0000000000000000,
    r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000,
    r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001,
    r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001,
    r27: 0000000104c5983c, r28: 0000000000000000, r29: 0000000000000100,
    r30: 0000000000000000, r31: 0000000110150b80,
    *** 16:37:14.603
    ksedmp: internal or fatal error
    ORA-7445: exception encountered: core dump [kghalp+0500] [SIGSEGV]
    [Invalid permissions for mapped object] [0x00000003B] [] []
    Current SQL statement for this session:
    select dummy from dual where  ora_dict_obj_type = 'TABLE'
----- Call Stack Trace -----ptmak pdybF00_Init pdy1F79_Init pdy1F01_Driver pdli_new_cog pdlifuphpcog phpcmp pcicms2 pcicms kkxcms kkxswcm kkxmpbms kkxmesu xtyplsTo Filer.Based on this call stack this would appear a likely match forbug 6951953 Abstract: ORA-7445 [PTMAK] IMPORTING PACKAGE COMPILED DEBUG.This bug is fixed on 10.2.0.5 and there is a 10.2.0.4 patch available for IBM AIX Based Systems (64-bit).It maybe worth while to have the customer apply the patch to seeif it resolves the issue.Also the uploaded files included test.sql is this a reproducable testcase?

这个bug 似乎仅在 IBM AIX on POWER Systems (64-bit) 发生,当以DEBUG 模式编译包时有一定几率出现。
好了,既然已经了解了可能发生的诱因,我们可以进一步分析了,接下来看看 errorstack trace信息中 的SO 记录。

      SO: 70000043d217668, type: 53, owner: 70000048cee2238, flag: INIT/-/-/0x00
      LIBRARY OBJECT LOCK: lock=70000043d217668 handle=700000446261588 mode=N
      call pin=0 session pin=0 hpc=0000 hlc=0000
      htl=70000043d2176e8[70000042b52b368,70000042bb9a808] htb=70000044929b460 ssga=70000044929ad68
      user=70000048cee2238 session=70000048eb33010 count=1 flags=[0000] savepoint=0x4bac1488
      LIBRARY OBJECT HANDLE: handle=700000446261588 mtx=7000004462616b8(1) cdp=1
      name=ALTER TRIGGER "SHUCRM3O"."TRI_PRODUCT_INSTANCE_RELATED" COMPILE DEBUG REUSE SETTINGS
      hash=164e6a8942406cee159f8943a1a3c85e timestamp=03-26-2010 09:52:12
      namespace=CRSR flags=RON/KGHP/TIM/PN0/SML/KST/DBN/MTX/[120100d0]
      kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=16 hpc=0002 hlc=0002
      lwt=700000446261630[700000446261630,700000446261630] ltm=700000446261640[700000446261640,700000446261640]
      pwt=7000004462615f8[7000004462615f8,7000004462615f8] ptm=700000446261608[700000446261608,700000446261608]
      ref=700000446261660[700000446261660,700000446261660] lnd=700000446261678[700000446261678,700000446261678]
        LIBRARY OBJECT: object=70000045adbc1e8
        type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
        CHILDREN: size=16
        child#    table reference   handle
             5 70000041776f5c0 70000045ae44720 70000042bfa3a20
        DATA BLOCKS:
        data#     heap  pointer    status pins change whr
            0 70000043d9fed20 70000045adbc300 I/P/A/-/-    0 NONE   00

的确有以debug 模式编译对象的语句,不过对象不是包而是trigger ; 看起来只要是可以以debug 模式compile 的对象都有可能引发该问题。
好了,问题到这里已经比较明确了: 应用端以DEBUG模式重新编译包引发了 Oracle bug 8244533,从而导致了对应服务进程的崩溃;总算是虚惊一场,之后通过trace内的machine和user信息找到了实施变更的应用方人员并教育之。

对Oracle中索引叶块分裂而引起延迟情况的测试和分析

在版本10.2.0.4未打上相关one-off补丁的情况下,分别对ASSM和MSSM管理模式表空间进行索引分裂测试,经过测试的结论如下:

l  在10gr2版本中MSSM方式是不能避免索引分裂引起交易超时问题;

l  10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。

l  合并索引是目前最具可行性的解决方案(alter index coalesce)。

l  最新的11gr2中经测试仍存在该问题。

具体测试过程如下:

1.    自动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management AUTO

3  extent management local uniform size 10M;

创建自动段管理的表空间

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

Table created.

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Index created.         创建实验对象表及索引

SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000;           插入25万条记录

250000 rows created.

SQL> commit;

Commit complete.

SQL>create table idx2 tablespace idx1 as select * from idx1 where 1=2;

Table created.

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)                    取出后端部分记录,即每250条取一条

)

where mod(rn, 250) = 0

)

/

933 rows created.

SQL> commit;

Commit complete.

SQL> analyze index idx1_idx validate structure; 分析原索引

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0               未删除情况下499个叶块

SQL> delete from idx1 where a between 10127 and 243625;                             大量删除

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499      233499            删除后叶块数量不变

SQL> insert into idx1 select * from idx2;                   令那些empty 不再empty,但每个块中只有一到二条记录,空闲率仍为75-100%

commit;

933 rows created.

Commit complete.

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;          造成leaf块分裂前提

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

997 leaf node splits

997 leaf node 90-10 splits

0 branch node splits

0 queue splits                 找出当前会话目前的叶块分裂次数

SQL>insert into idx1 values (251000);                                        此处确实叶块分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

998 leaf node splits

998 leaf node 90-10 splits

0 branch node splits

0 queue splits         可以看到对比之前的查询多了一个叶块分裂

SQL> set linesize 200 pagesize 1500;

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     271601       271601            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      82803        82803            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1         177 0       3728         3728              1

insert into idx1 values (251000)     读了那些实际不空的块,较多buffer_get

1        1409          0      40293        40293            933

insert into idx1 select * from idx2

1      240842          0    3478341      3478341         250000

SQL> insert into idx1 values (251001);                                  不分裂的插入

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     271601       271601            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      82803        82803            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           9          0       1640         1640              1

insert into idx1 values (251001) 不分裂的插入,少量buffer_gets

1         177          0       3728         3728              1

insert into idx1 values (251000)

1        1409          0      40293        40293            933

insert into idx1 select * from idx2

1      240842          0    3478341      3478341         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

如演示1所示,在自动段管理模式下大量删除后插入造成许多块为75%-100%空闲率且不完全为空,此后叶块分裂时将引起插入操作的相关前台进程扫描大量“空块“,若这些块不在内存中(引发物理读)且可能需要延迟块清除等原因时,减缓了该扫描操作的速度,造成叶块分裂缓慢,最终导致了其他insert操作被split操作所阻塞,出现enq:tx index contention等待事件。

2.  手动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management MANUAL                                      — MSSM的情况

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

233499 rows deleted.

SQL> commit;

Commit complete.

SQL> insert into idx1 select * from idx2;

commit;

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL>

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1496 leaf node splits

1496 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                                  确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1497 leaf node splits

1497 leaf node 90-10 splits

0 branch node splits

0 queue splits

以上与ASSM时完全一致

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     283301       283301            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      78465        78465            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1        963 0      10422        10422              1              ASSM模式下更大量的空块

insert into idx1 values (251000)

1         984          0      35615        35615            933

insert into idx1 select * from idx2

1      238579          0    3468326      3469984         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     283301       283301            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      78465        78465            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           7 0       1476         1476              1

insert into idx1 values (251001)    —不分裂的情况与ASSM时一致

1         963 0      10422        10422              1

insert into idx1 values (251000)

1         984          0      35615        35615            933

insert into idx1 select * from idx2

1      238579          0    3468326      3469984         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示2所示,MSSM情况下叶块分裂读取了比ASSM模式下更多的“空块“;MSSM并不能解决大量删除后叶块分裂需要扫描大量非空块的问题,实际上可能更糟糕。从理论上讲MSSM的freelist只能指出那些未达到pctfree和曾经到达pctfree后来删除记录后使用空间下降到pctused的块(doc:A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED.),换而言之MSSM模式下”空块“会更多。

3.  自动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management AUTO                                       — ASSM coalesce情况

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL>

Index created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280         33           0 — coalesc lf块合并了

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1999 leaf node splits

1995 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                                       确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2000 leaf node splits

1996 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     268924       268924            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      78349        78349            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          23 0       2218         2218              1                             少量buffer gets

insert into idx1 values (251000)

1         191          0      15596        15596            933

insert into idx1 select * from idx2

1      240852          0    3206130      3206130         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     268924       268924            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      78349        78349            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           9 0       1574         1574              1

insert into idx1 values (251001)

1          23 0       2218         2218              1

insert into idx1 values (251000)

1         191          0      15596        15596            933

insert into idx1 select * from idx2

1      240852          0    3206130      3206130         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示三所示在删除后进行coalesce操作,合并操作将大量空块分离出了索引结构(move empty out of index structure),之后的叶块分裂仅读取了少量必要的块。

4.  手动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management MANUAL                               — mssm情况下 coalesce

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499      233499

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280         33           0

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2502 leaf node splits

2494 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                       确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2503 leaf node splits

2495 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     281059       281059            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      77817        77817            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          19          0       2010         2010              1                       少量buffer get

insert into idx1 values (251000)

1         126          0      15364        15364            933

insert into idx1 select * from idx2

1      238644          0    3229737      3230569         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     281059       281059            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      77817        77817            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          7 0       1460         1460              1

insert into idx1 values (251001)

1          19 0       2010         2010              1

insert into idx1 values (251000)

1         126          0      15364        15364            933

insert into idx1 select * from idx2

1      238644          0    3229737      3230569         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示4所示,MSSM模式下合并操作与ASSM情况下大致一样,合并操作可以有效解决该问题。

5.  Coalesce合并操作的锁影响

SQL> create table coal (t1 int);

Table created.

SQL> create index pk_t1 on coal(t1);

Index created.

SQL> begin

2    for i in 1..3000 loop

3      insert into coal values(i);

4      commit;

5      end loop;

6      end;

7  /

PL/SQL procedure successfully completed.

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed.    注意analyze validate操作会block一切dml操作

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6        2500          删除后的状态

此时另开一个会话,开始dml操作:

SQL> update coal set t1=t1+1;

500 rows updated.

回到原会话

SQL> alter index pk_T1 coalesce;             — coalesce 未被阻塞

Index altered.

在另一个会话中commit,以便执行validate structure

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          3         500

显然coalesce的操作没有涉及有dml操作的块

在没有dml操作的情况下:

SQL> truncate table coal;

Table truncated.

SQL> begin

2    for i in 1..3000 loop

3      insert into coal values(i);

4      commit;

5      end loop;

6      end;

7  /

PL/SQL procedure successfully completed.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6           0

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6        2500

SQL> alter index pk_t1 coalesce;

Index altered.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          1           0

没有dml时,coalesce 操作涉及了所有块

如演示5所示coalesce会避开dml操作涉及的块,但在coalesec的短暂间歇出现在索引上有事务的块不会太多。且coalesce操作不会降低索引高度。

附件是关于rebuild及coalesce索引操作的详细描述:

6.  Coalesce操作总结

优点:

l  是一种快速的操作,对整体性能影响最小(not performance sensitive)。

l  不会锁表,绕过有事务的索引块。

l  可以有效解决现有的问题。

l  不会降低索引高度,引起再次的root split

缺点:

l  需要针对个别对象,定期执行合并操作;无法一劳永逸地全局地解决该问题。

7.  Linux 10.2.0.4上相关补丁的技术交流

Metalink bug 8286901 note中叙述了一位用户遇到相同的问题并提交了SR,当时oracle support给出了one-off补丁,但该用户在apply了该补丁后仍未解决问题。

以下为note 原文:

It is similar to bug8286901, but after applied patch8286901, still see enq tx
contentiona with high “failed probes on index block reclamation”

Issue encountered by customer and Oracle developer (Stefan Pommerenk).


He describes is thus:


"Space search performed by the index splitter can't find space in neighboring


blocks, and then instead of allocating new space, we go and continue to


search for space elsewhere, which manifests itself in block reads from disk,


block cleanouts, and subsequent blocks written due to aggressive MTTR


setting."




"To clarify: the cleanouts are not the problem per se. The culprit seems to


be that the space search performed by the index splitter can't find space in


neighboring blocks, and then instead of allocating new space, we go and


continue to search for space elsewhere, which manifests itself in block reads


from disk, block cleanouts, and subsequent blocks written due to aggressive


MTTR setting. This action has caused other sessions to get blocked on TX


enqueue contention, blocked on the splitting session. Advice was to set 10224


trace event for the splitter for a short time only in order to get


diagnostics as to why the space search rejected most blocks.


> A secondary symptom are the bitmap level 1 block updates, which may or may


not be related to the space search; I've not seen them before, maybe because


I didn't really pay attention :P , but the symptoms seen in the ASH trace


indicate it's the same problem. Someone in space mgmt has to look at it to


confirm it is the same problem."

与该用户进行了mail私下交流,他的回复:

I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.

I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question.

I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though.

Hi ,

Thanks for your sharing .  The bug (or specific ktsp behave) is fatal in response time sensitive  OLTP env.

I would like to ask my customer to coalesce those index where massive deleted regularly.

Thanks for your help again!

Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause.

Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far.

If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below.

select SS.snap_id,
SS.stat_name,
TO_CHAR(S.BEGIN_INTERVAL_TIME, ‘DAY’) DAY,
S.BEGIN_INTERVAL_TIME,
S.END_INTERVAL_TIME,
SS.value,
SS.value – LAG(SS.VALUE, 1, ss.value) OVER (ORDER BY SS.SNAP_ID) AS DIFF
from DBA_HIST_SYSSTAT SS,
DBA_HIST_SNAPSHOT S
where S.SNAP_ID = SS.SNAP_ID
AND SS.stat_NAME = ‘failed probes on index block reclamation’
ORDER BY SS.SNAP_ID ;

  1. 在11gr2上的测试

在最新的11gr2中进行了测试,仍可以重现该问题(如图单条insert引起了6675buffer_gets,这是在更大量数据的情况下)。

我们可以猜测Oracle提供的one-off补丁中可能是为叶块分裂所会扫描的“空块”附加了一个上限,在未达到上限的情况下扫描仍会发生。而在主流的公开的发行版本中Oracle不会引入该补丁的内容。尝试在没有缓存的情况下引起分裂问题,分裂引起了大约4000个块的物理读,但该操作仍在0.12秒(有缓存是0.02秒,如图)内完成了(该测试使用普通ata硬盘,读取速度在100MB/S: Timing buffered disk reads:  306 MB in  3.00 seconds = 101.93 MB/sec);从1月21日的ash视图中可以看到引起split的260会话处于单块读等待(db file sequential read)中,且已等待了43950us约等于44ms;这与良好io的经验值10ms左右有较大出入;我们可以确信io性能问题也是引发此叶块分裂延迟如此显性的一个重要因素。

具体结论

综上所述,在之前讨论的几个方案中,MSSM方式是不能避免索引分裂引起交易超时问题的;不删除数据的方案在许多对象上不可行;10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。Coalesce合并索引是目前既有的最具可操作性且无副作用的解决方案。

简易高负载进程记录脚本

Oracle 10g 中引入了v$osstat 视图方便了dba了解主机负载情况,同时也可以通过oem网页观察到一段时间内主机上负载较高的进程;但如果db未开启oem管理界面,则无法了解过去时段内高负载服务进程的相关信息。以下脚本可以给予一定的帮助。

CREATE TABLE "SYS"."HIGHLOAD_HISTORY"
(
"SAMPLE_TIME" DATE,
"SPID"     NUMBER(10,0),
"LOAD"     VARCHAR2(7 BYTE),
"SID"      VARCHAR2(30 BYTE),
"USERNAME" VARCHAR2(40 BYTE),
"MACHINE"  VARCHAR2(64 BYTE),
"PROGRAM"  VARCHAR2(48 BYTE),
"SQL_ID"   VARCHAR2(13 BYTE),
"SQL_FULLTEXT" CLOB,
"INST_ID" NUMBER(2,0),
"STATUS"  VARCHAR2(8 BYTE)
)    --建立记录高负载进程信息的表,内容包括了cpu使用率,及sql(并不十分准确,因为获取spid后需要进行查询)
ps aux|grep $ORACLE_SID|awk '{ if($3>=0.3) print "insert into highload_history select sysdate rec_time,"$2,","$3"%",", ss.sid,ss.username,ss.machine,ss.program,ss.sql_id,(select sql_fulltext from v$sqlarea sq where sq.sql_id=ss.sql_id),(select instance_number from v$instance),ss.status from v$session ss,v$process pr where  pr.addr=ss.paddr and pr.spid=",$2";"}'  | sqlplus / as sysdba  --直接运行即可

Java程序导入文本文件生成XLS格式

最近工作中碰到一件事,要从文本文件中导入数据生成xls文件。 想想不难的,前端时间写过一个java程序,是放在server上跑的,从Oracle数据库里取数据再生成xls文件。 于是,把原先的那个程序给拿过来改改,很快就成了。由于这个是给用户在客户端操作的,所以用swing写了个很简朴的GUI界面。
但实际运行测试时,发现了一个很大的问题,那就是内存的问题。原先的那个程序,将要导入的数据先都写到一个vector中。 在服务器上,30多万行的数据都轻松搞定, 但一放到Win XP上就不行了。(我的机器内存512MB) 我测试的文本文件,有10万行数据,导入到vector后,再将数据通过POI包写到xls文件时就会使java虚拟机堆溢出,我手动调整了虚拟机最大可用内存量,即设置 -Xmx也还是不行,最后将测试数据减少到25000行程序顺利生成了xls文件。 故最终的程序的原理改成,先将导入的文本数据输出到temp文件,每个temp文件第一行为列名,从第二行起为数据,最多25000行。每个temp文件对应生成一个xls文件。经测试,效率还是相当不错的。哈哈,java现在效率其实很不错的啦,除非是那种对时间很苛刻的环境,那没有办法。

最后发布的时候,由于是给无专业IT知识的用户使用,所以我使用exe4j软件将其打包成一个exe文件,方便用户使用。 用户只要保证装有JRE 5.0或以上版本就可以了。(我没测试过6.0,相信没有问题的,哈哈,想当然了)

使用方法:点击select按钮,选中要导入的文件就可以了。生成的xls文件在被导入文件同一目录下;若有忽略的数据,在程序所在目录下生成log文件。

数据文件说明:
数据的第一行应为头信息,即表的列名,余下为数据本身,每个字段之间用|分割。 可下载测试样本来看。

软件下载地址:
TextConverter.zip

测试数据下载:
test2.txt

软件界面

软件界面

优化模式区别(all_rows & first_rows_n)

FIRST_ROWS优化模式以最快速度地检索出结果 集中的一行为其指导目标。当系统用户正在使用OLTP系统检索单条记录时,该 优化模式最为有效。但是该模式对于批处理密集型(batch)作业环境来说并不是最理想 的选择,在这种环境中一个查询通常需要检索许多行。FIRST_ROWS提示 一般会强制使用某些索引,而在默认环境(ALL_ROWS)中可能不采用这些索引。在使 用UPDATE和DELETE语句时FIRST_ROWS模式会被忽略,因这些DML操 作中所查询到的所有记录都会被更新或删除。另当使用以下分组语句(如GROUP BY,DISTINCT,INTERSECT,MINUS和UNION)时FIRST_ROWS模式均被ALL_ROWS模式取代,因为这些语句进行分组时必须检索所有行。当语句中有ORDER BY子句时,如果索引扫描可以进行实际的排序工作,则优化器将避免额外的排 序。当索引扫描可用并且索引处于内部表(inner table)时,优化器将更倾向于NESTED LOOPS即嵌套循环而非SORT MERGE排 序连接。

另10g中现有的FIRST_ROWS模式的变体FIRST_ROWS_N来 指定以多少行数最快返回。这个值介于10~1000之间,这个使用FIRST_ROWS_N的新方法是完全基于成本的方法,它对于N的取值较敏感,若N甚小,优化器就会产生包 括嵌套循环以及索引查找的计划。如果N值较大,优化器也可能生成由散列连接和全表扫描组 成的计划(类似于ALL_ROWS)。 又FIRST_ROW与FIRST_ROWS_N存 在不同,FIRST_ROW模式中保量了部分基于规则的代码,而FIRST_ROWS_N模式则是完完全全基于统计信息计算相应成本,如Oracle文档所述:

ALL_ROWS优化模式指导查询以最快速度检索出所 有行(最佳吞吐量)。当系统用户 处于需要大量批处理报告的环境中,该模式较理想。

在实际的SQL硬解析过程中,FIRST_ROWS_N模式将首先以ALL_ROWS模 式的方式计算一次各执行计划的具体代价,之后将我们需要的N条记录代入成本计算中代替实 际全部的候选行(CARD)以得出FIRST_ROWS_N中 的计划成本。

create table test as select  * from dba_objects;

create table testa as select * from test;

alter session set events’10053 trace name context forever,level 1′;    –使用10053事 件获取成本计算过程trace

alter session set optimizer_mode=all_rows;

select test.owner from test,testa where test.object_id=testa.object_id

alter session set events’10053 trace name context off’;

下为ALL_ROWS模式中,最佳连接方式的选 取:

NL Join

Outer table: Card: 9622.00  Cost: 35.37  Resp: 35.37  Degree: 1  Bytes: 7

Inner table: TESTA  Alias: TESTA

Access Path: TableScan

NL Join:  Cost: 318924.52  Resp: 318924.52  Degree: 0

Cost_io: 315358.00  Cost_cpu: 27736509932

Resp_io: 315358.00  Resp_cpu: 27736509932

Access Path: index (index (FFS))

Index: INDA_ID

resc_io: 5.69  resc_cpu: 1304190

ix_sel: 0.0000e+00  ix_sel_with_filters: 1

Inner table: TESTA  Alias: TESTA

Access Path: index (FFS)

NL Join:  Cost: 56375.98  Resp: 56375.98  Degree: 0

Cost_io: 54762.00  Cost_cpu: 12551800804

Resp_io: 54762.00  Resp_cpu: 12551800804

Access Path: index (AllEqJoinGuess)

Index: INDA_ID

resc_io: 1.00  resc_cpu: 8171

ix_sel: 1.0393e-04  ix_sel_with_filters: 1.0393e-04

NL Join: Cost: 9667.48  Resp: 9667.48  Degree: 1

Cost_io: 9657.00  Cost_cpu: 81507910

Resp_io: 9657.00  Resp_cpu: 81507910

Best NL cost: 9667.48

resc: 9667.48 resc_io: 9657.00 resc_cpu: 81507910

resp: 9667.48 resp_io: 9657.00 resp_cpu: 81507910

Join Card:  9622.00 = outer (9622.00) * inner (9622.00) * sel (1.0393e-04)

Join Card – Rounded: 9622 Computed: 9622.00

SM Join

Outer table:

resc: 35.37  card 9622.00  bytes: 7  deg: 1  resp: 35.37

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      17 Row size:           14 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         10

Total IO sort cost: 27      Total CPU sort cost: 13931876

Total Temp space used: 254000

SM join: Resc: 203.62  Resp: 203.62  [multiMatchCost=0.00]

HA Join

Outer table:

resc: 35.37  card 9622.00  bytes: 7  deg: 1  resp: 35.37

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

Cost per ptn: 0.81  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 43.35  Resp: 43.35  [multiMatchCost=0.00]

HA Join (swap)

Outer table:

resc: 7.17  card 9622.00  bytes: 3  deg: 1  resp: 7.17

Inner table: TEST  Alias: TEST

resc: 35.37  card: 9622.00  bytes: 7  deg: 1  resp: 35.37

using dmeth: 2  #groups: 1

Cost per ptn: 0.81  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 43.35  Resp: 43.35  [multiMatchCost=0.00]

HA cost: 43.35

resc: 43.35 resc_io: 42.00 resc_cpu: 10480460

resp: 43.35 resp_io: 42.00 resp_cpu: 10480460

Best:: JoinMethod: Hash

Cost: 43.35  Degree: 1  Resp: 43.35  Card: 9622.00  Bytes: 10

***********************

Best so far: Table#: 0  cost: 35.3706  card: 9622.0000  bytes: 67354

Table#: 1  cost: 43.3476  card: 9622.0000  bytes: 96220

可以看到连接中二表上的候选行都是9622条,实际结果集也是9622条。

我们来看FIRST_ROWS_10情况下的trace:

alter session set events’10053 trace name context forever,level 1′;

alter session set optimizer_mode=first_rows_10;

select test.owner from test,testa where test.object_id=testa.object_id;

alter session set events’10053 trace name context off’;

Now joining: TEST[TEST]#0

***************

NL Join

Outer table: Card: 11.00  Cost: 2.00  Resp: 2.00  Degree: 1  Bytes: 3

Inner table: TEST  Alias: TEST

Access Path: TableScan

NL Join:  Cost: 368.08  Resp: 368.08  Degree: 0

Cost_io: 364.00  Cost_cpu: 31713898

Resp_io: 364.00  Resp_cpu: 31713898

Access Path: index (AllEqJoinGuess)

Index: IND_ID

resc_io: 2.00  resc_cpu: 15503

ix_sel: 1.0393e-04  ix_sel_with_filters: 1.0393e-04

NL Join (ordered): Cost: 24.02  Resp: 24.02  Degree: 1

Cost_io: 24.00  Cost_cpu: 178973

Resp_io: 24.00  Resp_cpu: 178973

Best NL cost: 24.02

resc: 24.02 resc_io: 24.00 resc_cpu: 178973

resp: 24.02 resp_io: 24.00 resp_cpu: 178973

Join Card:  11.00 = outer (11.00) * inner (9622.00) * sel (1.0393e-04)

Join Card – Rounded: 11 Computed: 11.00

SM Join

Outer table:

resc: 7.17  card 9622.00  bytes: 3  deg: 1  resp: 7.17

Inner table: TEST  Alias: TEST

resc: 35.37  card: 9622.00  bytes: 7  deg: 1  resp: 35.37

using dmeth: 2  #groups: 1

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      22 Row size:           18 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         14

Total IO sort cost: 36      Total CPU sort cost: 14055006

Total Temp space used: 320000

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      17 Row size:           14 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         10

Total IO sort cost: 27      Total CPU sort cost: 13931876

Total Temp space used: 254000

SM join: Resc: 109.14  Resp: 109.14  [multiMatchCost=0.00]

SM cost: 109.14

resc: 109.14 resc_io: 105.00 resc_cpu: 32173386

resp: 109.14 resp_io: 105.00 resp_cpu: 32173386

SM Join (with index on outer)

Access Path: index (FullScan)

Index: IND_ID

resc_io: 167.00  resc_cpu: 5134300

ix_sel: 1  ix_sel_with_filters: 1

Cost: 167.66  Resp: 167.66  Degree: 1

Outer table:

resc: 167.66  card 11.00  bytes: 7  deg: 1  resp: 167.66

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      17 Row size:           14 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         10

Total IO sort cost: 27      Total CPU sort cost: 13931876

Total Temp space used: 254000

SM join: Resc: 203.62  Resp: 203.62  [multiMatchCost=0.00]

HA Join

Outer table:

resc: 35.37  card 9622.00  bytes: 7  deg: 1  resp: 35.37

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

Cost per ptn: 0.81  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 43.35  Resp: 43.35  [multiMatchCost=0.00]

HA Join (swap)

Outer table:

resc: 7.17  card 9622.00  bytes: 3  deg: 1  resp: 7.17

Inner table: TEST  Alias: TEST

resc: 2.00  card: 11.00  bytes: 7  deg: 1  resp: 2.00

using dmeth: 2  #groups: 1

Cost per ptn: 0.69  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 9.85  Resp: 9.85  [multiMatchCost=0.00]

HA cost: 9.85

resc: 9.85 resc_io: 9.00 resc_cpu: 6646477

resp: 9.85 resp_io: 9.00 resp_cpu: 6646477

First K Rows: copy A one plan, tab=TESTA

Best:: JoinMethod: Hash

Cost: 9.85  Degree: 1  Resp: 9.85  Card: 9622.00  Bytes: 17

***********************

Best so far: Table#: 0  cost: 2.0012  card: 11.0000  bytes: 77

Table#: 1  cost: 9.8546  card: 9622.0000  bytes: 163574

可以看到此次计算中代入了用户希望最先返回的结果 条数11(为10+1),通过设 置连接对象的候选结果集(Card)以到达相关优化目的,相应的COST均有所下降。

下为FIRST_ROWS_1000的情况:

alter session set events’10053 trace name context forever,level 1′;

alter session set optimizer_mode=first_rows_1000;

select test.owner from test,testa where test.object_id=testa.object_id;

alter session set events’10053 trace name context off’;

NL Join

Outer table: Card: 1000.00  Cost: 5.04  Resp: 5.04  Degree: 1  Bytes: 7

Inner table: TESTA  Alias: TESTA

Access Path: TableScan

NL Join:  Cost: 33147.66  Resp: 33147.66  Degree: 0

Cost_io: 32777.00  Cost_cpu: 2882616819

Resp_io: 32777.00  Resp_cpu: 2882616819

Access Path: index (index (FFS))

Index: INDA_ID

resc_io: 5.69  resc_cpu: 1304190

ix_sel: 0.0000e+00  ix_sel_with_filters: 1

Inner table: TESTA  Alias: TESTA

Access Path: index (FFS)

NL Join:  Cost: 5861.74  Resp: 5861.74  Degree: 0

Cost_io: 5694.00  Cost_cpu: 1304492819

Resp_io: 5694.00  Resp_cpu: 1304492819

Access Path: index (AllEqJoinGuess)

Index: INDA_ID

resc_io: 1.00  resc_cpu: 8171

ix_sel: 1.0393e-04  ix_sel_with_filters: 1.0393e-04

NL Join: Cost: 1006.09  Resp: 1006.09  Degree: 1

Cost_io: 1005.00  Cost_cpu: 8474019

Resp_io: 1005.00  Resp_cpu: 8474019

Best NL cost: 1006.09

resc: 1006.09 resc_io: 1005.00 resc_cpu: 8474019

resp: 1006.09 resp_io: 1005.00 resp_cpu: 8474019

Join Card:  1000.00 = outer (1000.00) * inner (9622.00) * sel (1.0393e-04)

Join Card – Rounded: 1000 Computed: 1000.00

SM Join

Outer table:

resc: 35.37  card 9622.00  bytes: 7  deg: 1  resp: 35.37

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      22 Row size:           18 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         14

Total IO sort cost: 36      Total CPU sort cost: 14055006

Total Temp space used: 320000

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      17 Row size:           14 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         10

Total IO sort cost: 27      Total CPU sort cost: 13931876

Total Temp space used: 254000

SM join: Resc: 109.14  Resp: 109.14  [multiMatchCost=0.00]

SM cost: 109.14

resc: 109.14 resc_io: 105.00 resc_cpu: 32173386

resp: 109.14 resp_io: 105.00 resp_cpu: 32173386

SM Join (with index on outer)

Access Path: index (FullScan)

Index: IND_ID

resc_io: 167.00  resc_cpu: 5134300

ix_sel: 1  ix_sel_with_filters: 1

Cost: 167.66  Resp: 167.66  Degree: 1

Outer table:

resc: 167.66  card 1000.00  bytes: 7  deg: 1  resp: 167.66

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

SORT resource      Sort statistics

Sort width:          70 Area size:      131072 Max Area size:    12582912

Degree:               1

Blocks to Sort:      17 Row size:           14 Total Rows:           9622

Initial runs:         2 Merge passes:        1 IO Cost / pass:         10

Total IO sort cost: 27      Total CPU sort cost: 13931876

Total Temp space used: 254000

SM join: Resc: 203.62  Resp: 203.62  [multiMatchCost=0.00]

HA Join

Outer table:

resc: 35.37  card 9622.00  bytes: 7  deg: 1  resp: 35.37

Inner table: TESTA  Alias: TESTA

resc: 7.17  card: 9622.00  bytes: 3  deg: 1  resp: 7.17

using dmeth: 2  #groups: 1

Cost per ptn: 0.81  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 43.35  Resp: 43.35  [multiMatchCost=0.00]

HA Join (swap)

Outer table:

resc: 7.17  card 9622.00  bytes: 3  deg: 1  resp: 7.17

Inner table: TEST  Alias: TEST

resc: 5.04  card: 1000.00  bytes: 7  deg: 1  resp: 5.04

using dmeth: 2  #groups: 1

Cost per ptn: 0.70  #ptns: 1

hash_area: 124 (max=3072)   Hash join: Resc: 12.91  Resp: 12.91  [multiMatchCost=0.00]

HA cost: 12.91

resc: 12.91 resc_io: 12.00 resc_cpu: 7038524

resp: 12.91 resp_io: 12.00 resp_cpu: 7038524

First K Rows: copy A one plan, tab=TESTA

Best:: JoinMethod: Hash

Cost: 12.91  Degree: 1  Resp: 12.91  Card: 9622.00  Bytes: 17

***********************

Best so far: Table#: 0  cost: 5.0389  card: 1000.0000  bytes: 7000

Table#: 1  cost: 12.9051  card: 9622.0000  bytes: 163574

可以看到此处代入了1000为某一连接对象的候选行数。

MOS上有一个著名的《MIGRATING TO THE COST-BASED OPTIMIZER》教材,详细介绍了RBO和CBO的区别:
[gview file=”http://askmac.cn/wp-content/uploads/resource/40178_rbo_rip.doc”]

如何使用gdb工具对Oracle系统状态(systemstate)做trace

当Oracle系统hang住 ,无法使用一切方法登录时 (包括 sqlplus -prelim / as sysdba),我们可以使用gdb调试工具来对 Oracle做系统 dump ,通过 系统 dump信息 判断 具体hang的原因 。 若直接 将 进程 kill 掉,则将失去现场 无法帮助今后避免 这样的hang情况。

要使用gdb 外部工具, 就需要知道目前实例中后台进程的进程号。

我们一般通过 以下命令列出 Oracle 进程:ps -ef|grep <SID>

[oracle@rh2 ~]$ ps -ef|grep oraclewebmoney
oracle   16996 16995  0 21:55 ?        00:00:00 oraclewebmoney (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

然后启动gdb ,指定Oracle软件中二进制文件 oracle的位置和 进程id

[oracle@rh2 udump]$ gdb $ORACLE_HOME/bin/oracle  16996
GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”…
(no debugging symbols found)
Using host libthread_db library “/lib64/tls/libthread_db.so.1″.

Attaching to program: /u01/oracle/product/10.2.0/db_1/bin/oracle, process 14594
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libskgxp10.so…(no debugging symbols found)…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libskgxp10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libhasgen10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libhasgen10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libskgxn2.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libskgxn2.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocr10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocr10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocrb10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocrb10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocrutl10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocrutl10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libjox10.so…

在gdb 提示行中 输入 print ksudss(10),即

(gdb) print ksudss(10)

之后将在udump目录中产生相关<SID>_ora_<pid>的trace文件,我们通过分析trace可以发现hang的主要原因。

trace文件示例如下:

System name:    Linux
Node name:      rh2
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: webmoney
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 16996, image: oracle@rh2 (TNS V1-V3)

*** 2009-09-07 21:57:14.100
*** SERVICE NAME:(SYS$USERS) 2009-09-07 21:57:14.100
*** SESSION ID:(528.2041) 2009-09-07 21:57:14.100
===================================================
SYSTEM STATE
————
System global information:
processes: base 0x91637c30, size 500, cleanup 0x9167a2e0
allocation: free sessions 0x91779840, free calls (nil)
control alloc errors: 0 (process), 0 (session), 0 (call)
PMON latch cleanup depth: 0
seconds since PMON’s last scan for dead processes: 45
system statistics:

[gview file=”http://askmac.cn/wp-content/uploads/resource/gdb.pdf”]

Oracle database 11g release2发布

万众期待的11g r2终于掀开了盖头来,作为先行军,linux平台无疑可以让最多专家和用户更好最广泛的测试新版本。

oracle甚至为其使用了独立的域名:www.oracledatabase11g.com,针对新产品使用独立域名的作为并不多见,而该域名目前的pr 值 及 alex排名均甚低。究其根本 可能是一种潮流,类似暴雪的游戏产品均拥有自己独立的域名, 我们可以猜想今后会出现 oracledatabase12[X].com 和 oracledatabase13[X].com。下为11g主页的横幅flash。

在《Oracle announces availability of Oracle Database 11g Release 2》中,Oracle展开了一贯的忽悠伎俩,提出了以下优势:更成熟的网格计算帮助企业减少成本,节约存储成本,更少的无用冗余,自动调优与扩展性(更强大的awr和memeory auto tuning )。

以上优势当然是我们希望的,但现实往往是残酷的。 看到11g 逐渐成熟 走向 更广泛的运用,  则DBA本身需要掌握的 新特性 ,以及 新版本中不同的 “古怪脾性”  ,相信有机会一一体验。。。。

书籍推荐:《Secrets of the Oracle Database》

Oracle数据库的小秘密,作者是Norbert Debes;

就内容而言并非最internal,换句话说对日常管理还是有用的,当然是对expert而言。

我的博客中翻译了他对AUDIT_SYSLOG_LEVEL解释的若干章节,今后会继续努力。

该书出版已经较长时间,但国内并无任何出版迹象;实际DBA专题类书的销量也不大,始终属于小众类的。

书的内容还是很有价值的,特别是对于几个参数的深入研究和使用perl 尝试相关实验的内容。

附上封面:

secrets of oracle database database

附上下载地址:Oracle Secrets.

我们以学习为目的…….

RAC动态资源(DRM)管理介绍

以下文本摘自: metalink doc 390483.1

Subject:  DRM – Dynamic Resource management
Doc ID:  390483.1  Type:  BULLETIN
Modified Date :  13-JAN-2009  Status:  PUBLISHED

In this Document
Purpose
Scope and Application
DRM – Dynamic Resource management
DRM – Dynamic Resource Mastering
References

Applies to:
Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 11.1.0
Oracle Server – Standard Edition – Version: 10.1.0.2 to 11.1.0
Information in this document applies to any platform.
Oracle Real application Clusters
Purpose

To describe the concept of DRM (Dynamic Resource Mastering)
Scope and Application

This note in intended for experienced Real application cluster DBA’s
DRM – Dynamic Resource management
DRM – Dynamic Resource Mastering

When using Real application Clusters (RAC), Each instance has its own SGA and buffer cache. RAC will ensure that these block changes are co -ordinated to maximize performance and to ensure data intergrity. Each copy of the buffer also called as a cache resource has a master which is one of the nodes of the cluster.

In database releases before 10g (10.1.0.2) once a cache resource is mastered on an instance, a re-mastering or a change in the master would take place only during a reconfiguration that would happen automatically during both normal operations like instance startup or instance shutdown or abnormal events like Node eviction by Cluster Manager. So if Node B is the master of a cache resource, this resource will remain mastered on Node B until reconfiguration.

10g  introduces a concept of resource remastering via DRM. With DRM a resource can be re-mastered on another node say from Node B to Node A if it is found that the cache resource is accessed more frequently from Node A. A reconfiguration is no longer the only reason for a resource to be re-mastered.

In 10gR1 DRM is driven by affinity of files and in 10gR2 it is based on objects.

Sample LMD trace file during a DRM operation

Begin DRM(202) - transfer pkey 4294951314 to 0 oscan 1.1
*** 2006-08-01 17:34:54.645
Begin DRM(202) - transfer pkey 4294951315 to 0 oscan 1.1
*** 2006-08-01 17:34:54.646
Begin DRM(202) - transfer pkey 4294951316 to 0 oscan 1.1
*** 2006-08-01 17:34:54.646
Begin DRM(202) - transfer pkey 4294951317 to 0 oscan 1.1

DRM attributes are intentionally undocumented since they may change depending on the version. These attributes should not be changed without discussing with Support.
@DRM is driven by the following
@ 1.) _gc_affinity_time = Time in minutes at which statistics will be evaluated (default = 10 mins)
@ 2.) _gc_affinity_limit = # of times a node accesses a file/object (default = 50)
@ 3.) _gc_affinity_minimum = minimum # of times per minute a file/object is accessed before affinity kicks in
@ (default = 600 per minute per cpu )

It is important to note that

  1. Two instance will not start a DRM operation at the same time however lmd,lms,lmon processes from all instances collectively take part in the DRM operation.
  2. Normal activity on the database is not affected due to DRM. This means users continue insert/update/delete operations without any interruptions. Also DRM operations complete very quickly.

@

Disable DRM
Generally DRM should not be disabled unless Oracle Support/Development has suggested turning it off due to some known issues.
@To disable DRM, set
@To disable DRM, set
@_gc_affinity_time=0                                 # Only if DB version is 10.1 or 10.2
@_gc_undo_affinity=FALSE                       # Only if Db version is 10.2
@_gc_policy_time=FALSE                         # Only if DB version is 11.1 or higher
@_gc_affinity_time has been renamed to _gc_policy_time in 11g

参数cluster_interconnect详细介绍

以下文本摘自metalink doc:

This note attempts to clarify the cluster_interconnects parameter and the
platforms on which the implementation has been made. A brief explanation on
the workings of the parameter has also been presented in this note.
This is also one of the most frequently questions related to cluster and RAC
installations on most sites and forms a part of the prerequisite as well.

ORACLE 9I RAC – Parameter CLUSTER_INTERCONNECTS
———————————————–

FREQUENTLY ASKED QUESTIONS
————————–
November 2002

CONTENTS
——–
1.  What is the parameter CLUSTER_INTERCONNECTS for ?
2.  Is the parameter CLUSTER_INTERCONNECTS available for all platforms ?
3.  How is the Interconnect recognized on Linux ?
4.  Where could I find more information on this parameter ?
5.  How to detect which interconnect is used ?
6.  Cluster_Interconnects is mentioned in the 9i RAC administration
    guide as a Solaris specific parameter, is this the only platform
    where this parameter is available ?
7.  Are there any side effects for this parameter, namely affecting normal
    operations ?
8.  Is the parameter OPS_INTERCONNECTS which was available in 8i similar
    to this parameter ?
9.  Does Cluster_interconnect allow failover from one Interconnect to another
    Interconnect ?
10. Is the size of messages limited on the Interconnect ?
11. How can you see which protocoll is being used by the instances ?
12. Can the parameter CLUSTER_INTERCONNECTS be changed dynamically during runtime ?

 
QUESTIONS & ANSWERS
——————-
1. What is the parameter CLUSTER_INTERCONNECTS for ?

Answer
——
This parameter is used to influence the selection of the network interface
for Global Cache Service (GCS) and Global Enqueue Service (GES) processing.

This note does not compare the other elements of 8i OPS with 9i RAC
because of substantial differences in the behaviour of both architectures.
Oracle 9i RAC has certain optimizations which attempt to transfer most of
the information required via the interconnects so that the number of disk
reads are minimized. This behaviour known as Cache fusion phase 2 is summarised
in Note 139436.1
The definition of the interconnnect is a private network which
will be used to transfer the cluster traffic and Oracle Resource directory
information and blocks to satisfy queries. The technical term for that is
cache fusion.

The CLUSTER_INTERCONNECTS should be used when
– you want to override the default network selection
– bandwith of a single interconnect does not meet the bandwith requirements of
  a Real Application Cluster database

The syntax of the parameter is:

CLUSTER_INTERCONNECTS = if1:if2:…:ifn
Where if<n> is an IP address in standard dotted-decimal format, for example,
144.25.16.214. Subsequent platform implementations may specify interconnects
with different syntaxes.
2. Is the parameter CLUSTER_INTERCONNECTS available for all platforms ?

Answer
——

This parameter is configurable on most platforms.
This parameter can not be used on Linux.

The following Matrix shows when the parameter was introduced on which platform:

Operating System    Available since
AIX                   9.2.0
HP/UX                 9.0.1
HP Tru64              9.0.1
HP OPenVMS            9.0.1
Sun Solaris           9.0.1

References
———-
Bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.
Bug <2359300> ENHANCE CLUSTER_INTERCONNECTS TO WORK WITH 9I RAC ON IBM
3.  How is the Interconnect recognized on Linux ?

Answer
——
Since Oracle9i 9.2.0.8 CLUSTER_INTECONNETCS can be used to change the interconnect.
A patch is also available for 9.2.0.7 under Patch 4751660.
Before 9.2.0.8 the Oracle implementation for the interface selection reads the ‘private hostname’
in the cmcfg.ora file and uses the corresponding ip-address for the interconnect.
If no private hostname is available the public hostname will be used.
4.  Where could I find information on this parameter ?

Answer
——

The parameter is documented in the following books:
Oracle9i Database Reference Release 2 (9.2)
Oracle9i Release 1 (9.0.1) New Features in Oracle9i Database Reference –
                   What’s New in Oracle9i Database Reference?
Oracle9i Real Application Clusters Administration Release 2 (9.2)
Oracle9i Real Application Clusters Deployment and Performance Release 2 (9.2)

Also port specific documentation may contain information about the usage of
the cluster_interconnects parameter.

Documentation can be viewed on
    http://tahiti.oracle.com
    http://otn.oracle.com/documentation/content.html
References:
———–
Note 162725.1: OPS/RAC VMS: Using alternate TCP Interconnects on 8i OPS
               and 9i RAC on OpenVMS

Note 151051.1: Init.ora Parameter “CLUSTER_INTERCONNECTS” Reference Note

5. How to detect which interconnect is used ?
    The following commands show which interconnect is used for UDP or TCP:
    sqlplus> connect / as sysdba
             oradebug setmypid
             oradebug ipc
             exit

    The corresponding trace can be found in the user_dump_dest directory and for
    example contains the following information in the last couple of lines:

           SKGXPCTX: 0x32911a8 ctx
           admno 0x12f7150d admport:
           SSKGXPT 0x3291db8 flags SSKGXPT_READPENDING     info for network 0
                 socket no 9     IP 172.16.193.1         UDP 43307
                 sflags SSKGXPT_WRITESSKGXPT_UP
                 info for network 1
                 socket no 0     IP 0.0.0.0      UDP 0
                 sflags SSKGXPT_DOWN
           context timestamp 0x1ca5
                 no ports
   Please note that on some platforms and versions (Oracle9i 9.2.0.1 on Windows)
   you might see an ORA-70 when the command oradebug ipc has not been
   implemented.

   When  other protocols such as LLT, HMP or RDG are used, then the trace file will not
   reveal an IP address.
6.  Cluster_Interconnects is mentioned in the 9i RAC administration
    guide as a Solaris specific parameter, is this the only platform
    where this parameter is available ?

Answer
—– 

This information that this parameter works on Solaris only is incorrect. Please
check the answer for question number 2 for the complete list of platforms for the same.

References:
———–
bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.
7.  Are there any side effects for this parameter, namely affecting normal
    operations ?

Answer
—–
When you set CLUSTER_INTERCONNECTS in cluster configurations, the
interconnect high availability features are not available. In other words,
an interconnect failure that is normally unnoticeable would instead cause
an Oracle cluster failure as Oracle still attempts to access the network
interface which has gone down. Using this parameter you are explicitly
specifying the interface or list of interfaces to be used.
 

8.  Is the parameter OPS_INTERCONNECTS which was available in 8i similar
    to this parameter ?

Answer
——
Yes, the parameter OPS_INTERCONNECTS was used to influence the network selection
for the Oracle 8i Parallel Server.

Reference
———
Note <120650.1> Init.ora Parameter “OPS_INTERCONNECTS” Reference Note
9.  Does Cluster_interconnect allow failover from one Interconnect to another
    Interconnect ?

Answer
——
Failover capability is not implemented at the Oracle level. In general this
functionality is delivered by hardware and/or Software of the operating system.
For platform details please see Oracle platform specific documentation
and the operating system documentation.
10. Is the size of messages limited on the Interconnect ?

Answer
——
The message size depends on the protocoll and platform.
UDP: In Oracle9i Release 2 (9.2.0.1) message size for UDP was limited to 32K.
     Oracle9i 9.2.0.2 allows to use bigger UDP message sizes depending on the
     platform. To increase throughput on an interconnect you have to adjust
     udp kernel parameters.
TCP: There is no need to set the message size for TCP.
RDG: The recommendations for RDG are documented in
        Oracle9i Administrator’s Reference – Part No. A97297-01
References
———-
Bug <2475236> RAC multiblock read performance issue using UDP IPC
11. How can you see which protocoll is being used by the instances ?

Answer
——
Please see the alert-file(s) of your RAC instances. During startup you’ll
   find a message in the alert-file that shows the protocoll being used.

      Wed Oct 30 05:28:55 2002
      cluster interconnect IPC version:Oracle UDP/IP with Sun RSM disabled
      IPC Vendor 1 proto 2 Version 1.0
12. Can the parameter CLUSTER_INTERCONNECT be changed dynamically during runtime ?

Answer
——
    No. Cluster_interconnects is a static parameter and can only be set in the
    spfile or pfile (init.ora)

沪ICP备14014813号-2

沪公网安备 31010802001379号