ORA-00600 [4400][48]错误一例

5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了trace文件:,

*** SERVICE NAME:(ETL) 2010-05-26 16:45:45.930
*** SESSION ID:(262.12024) 2010-05-26 16:45:45.930
*** 2010-05-26 16:45:45.930
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
ORA-10387: parallel query server interrupt (normal)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              0FFFFFFFF ? 000000020 ?
ksedmp+0290          bl       ksedst               1048DFFC0 ?
ksfdmp+0018          bl       03F5B014
kgeriv+0108          bl       _ptrgl
kgeasi+0118          bl       kgeriv               10496FEC8 ? 000000002 ?
                                                   700000010008000 ? 110000AD0 ?
                                                   110190290 ?
ktcddt+012c          bl       kgeasi               110190110 ? 110450040 ?
                                                   113000001130 ? 200000002 ?
                                                   100000001 ? 000000000 ?
                                                   000000030 ? 000000013 ?
ktcsod+0384          bl       ktcddt               0000003C0 ? 000000018 ?
                                                   1048D4808 ?
kssdch_stage+0758    bl       _ptrgl
....................................................
----- End of Call Stack Trace -----
******************* Dumping process map ****************
377144 : ora_p001_ETL
100000000      93710K  read/exec         oracle
1100007ef       2372K  read/write        oracle
9fffffff0000000        44K  read/exec         /usr/ccs/bin/usla64
9fffffff000b30a         0K  read/write        /usr/ccs/bin/usla64
900000000375880       484K  read/exec         /usr/lib/liblvm.a[shr_64.o]
9001000a0121ac8       135K  read/write        /usr/lib/liblvm.a[shr_64.o]
900000000426180        74K  read/exec         /usr/lib/libcfg.a[shr_64.o]
9001000a00f5ed0        26K  read/write        /usr/lib/libcfg.a[shr_64.o]
90000000034e280         2K  read/exec         /usr/lib/libcrypt.a[shr_64.o]
9001000a00d4760         0K  read/write        /usr/lib/libcrypt.a[shr_64.o]
90000000069a7b0         4K  read/exec         /usr/lib/libc.a[aio_64.o]
9001000a022a340         0K  read/write        /usr/lib/libc.a[aio_64.o]
90000000035f300        85K  read/exec         /usr/lib/libodm.a[shr_64.o]
9001000a00d5c08        35K  read/write        /usr/lib/libodm.a[shr_64.o]
9000000003ef080        83K  read/exec         /usr/lib/libperfstat.a[shr_64.o]
9001000a01d0818         9K  read/write        /usr/lib/libperfstat.a[shr_64.o]
900000000358000         0K  read/exec         /usr/lib/libdl.a[shr_64.o]
9001000a0219000         0K  read/write        /usr/lib/libdl.a[shr_64.o]
9000000007ff100      8588K  read/exec         /oracle/product/10.2.0/lib/libjox10.a[shr.o]
8001000a0000038       585K  read/write        /oracle/product/10.2.0/lib/libjox10.a[shr.o]
9000000004a0000       228K  read/exec         /usr/lib/libpthreads.a[shr_xpg5_64.o]
9001000a0144000       558K  read/write        /usr/lib/libpthreads.a[shr_xpg5_64.o]
900000000045500      2966K  read/exec         /usr/lib/libc.a[shr_64.o]
9001000a0000788       844K  read/write        /usr/lib/libc.a[shr_64.o]
   Total      110843K
******************* End of process map dump ****************
===================================================
PROCESS STATE
-------------
Process global information:
     process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540
  ----------------------------------------
  SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=43, calls cur/top: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 115 0 4
              last post received-location: kslpsr
              last process to post me: 7000000cf479c38 1 6
              last post sent: 0 0 251
              last post sent-location: kxfpqr: QC
              last process posted by me: 7000000cf48a7f8 10 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 7000000cf50bd50
    O/S info: user: oracle, term: UNKNOWN, ospid: 377144
    OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001)
   SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00
    (session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/-
              DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1
              txn branch: 7000000cdf412c8
              oct: 0, prv: 0, sql: 0, psql: 0, user: 26/ETL
    O/S info: user: bo, term: pts/0, ospid: 377144, machine: etl_a
              program: oracle@etl_a (P001)
    last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0
                driver id=0, #bytes=1, =0
    Dumping Session Wait History
     for 'SQL*Net message from dblink' count=1 wait_time=476
                driver id=0, #bytes=1, =0
     for 'SQL*Net message to dblink' count=1 wait_time=1
                driver id=0, #bytes=1, =0
     for 'log file sync' count=1 wait_time=7483
                buffer#=ca9, =0, =0
     for 'log file sync' count=1 wait_time=358
                buffer#=ca9, =0, =0
     for 'PX Deq: Execution Msg' count=1 wait_time=15527
                sleeptime/senderid=1001ffff, passes=1, =0
     for 'log file sync' count=1 wait_time=11855
                buffer#=c57, =0, =0
     for 'PX Deq: Execution Msg' count=1 wait_time=277142
                sleeptime/senderid=1001ffff, passes=1, =0
     for 'direct path write' count=1 wait_time=25
                file number=5, first dba=103ca, block cnt=3
     for 'direct path write' count=1 wait_time=6
                file number=4, first dba=1abe1, block cnt=8
     for 'row cache lock' count=1 wait_time=3307
                cache id=5, mode=0, request=5
    temporary object counter: 0
      ----------------------------------------
      Virtual Thread:
      kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98
      consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig:
      vt_state: 0x200, vt_flags: 0x30, blkrun: 0
      is_assigned: 1, in_sched: 0 (0)
      vt_active: 0 (pending: 0)
      used quanta: 0 (cg: 0)
      cpu start time: 0, quantum status: 0x0
      quantum checks to skip: 0, check thresh: 0
      idle time: 0, active time: 0 (cg: 0)
      cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0)
      queued time outs: 0, time: 0 (cur 0, cg 0)
      calls aborted: 0, num est exec limit hit: 0
      undo current: 0k max: 0k
      ----------------------------------------
      UOL used : 0 locks(used=0, free=2)
      KGX Atomic Operation Log 7000000bdcf0568
       Mutex 0(0, 0) idn 0 oper NONE
       Cursor Parent uid 262 efd 9 whr 11 slp 0
       oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8
       pt4=0 u41=1 stt=0
      KGX Atomic Operation Log 7000000bdcf05b0
       Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE
       Cursor Stat uid 262 efd 12 whr 1 slp 0
       oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0
       pt4=0 u41=0 stt=0
      KGX Atomic Operation Log 7000000bdcf05f8
       Mutex 0(0, 0) idn 0 oper NONE
       Library Cache uid 262 efd 0 whr 0 slp 0
      ----------------------------------------
      SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
      (enqueue) DX-00000010-00000000	DID: 0001-002B-00151DA9
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2
      res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
      own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8

从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为”SQL*Net message from dblink”事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:

      SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
      (enqueue) DX-00000010-00000000	DID: 0001-002B-00151DA9
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2
      res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
      own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8

即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。

查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为:
1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务
2. trace显示当时无打开的游标
3. 可以看到当时的等待为’SQL*Net message from dblink’事件
Metalink文档[ID  444108.1]描 述该错误会在涉及远程数据库的并行DML操作中出现,但该错误仅会在清理阶段出现,故实际无影响(It is not critical as such, since it is happening during the cleanup of the operation.)。

该Bug可以通过实施Patch 5223587修复, 目前该补丁只有对应基础版本10.2.0.2和10.2.0.3的版本,该补丁在 10.2.0.3 上有AIX power(64 bit)和Solaris Sparc(64 bit)的版本,10.2.0.2上只有linux-64 与HP-UX的版本。因为该Bug一般不会产生不良影响,故实际上不建议应用补丁。

Comments

  1. admin says

    Applies to:
    Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 10.2.0.3 – Release: 9.2 to 10.2
    Information in this document applies to any platform.
    ***Checked for currency 30-Jul-2010***
    Symptoms

    ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
    ORA-10387: parallel query server interrupt (normal)

    This occurs while executing distributed query in parallel

    The stack trace includes:
    ksedmp kgeriv kgeasi ktcddt ktcsod kssdch_stage
    ktcbod kssdch_stage ksuxds ….

    The ORA-600 [4400] may also be accompanied or replaced by an ORA-600 [KXFPG1SG INV].
    The call stack may be similar to:
    kxfpg1sg kxfpgsg kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart selexe ctcdrv opiexe
    Cause
    This is unpublished bug: 5223587 “ORA-00600 [4400], [48] AND ORA-10387”

    Details: ORA-600 [4400] occurs in one or more query slaves
    Solution
    Fixed releases: 10.2.0.4, 11g

    Please check My Oracle Support to see if an interim patch exists for your platform

    To check availability and obtain a patch from My Oracle Support:
    a) Click on Patches & Updates Folder
    b) Click on Patch Search.
    c) Enter patch number: 5223587
    d) Select your Platform
    e) Click Search

  2. admin says

    Hdr: 5704068 10.1.0.5.0 RDBMS 10.1.0.5.0 PAR EXECUTION PRODID-5 PORTID-23 ORA-600 5223587
    Abstract: ORA-600 [KXFPG1SG:INV] DURING SLAVE CREATION

    *** 12/07/06 02:28 pm ***
    TAR:
    —-
    6004392.992

    PROBLEM:
    ——–
    ORA-600 [kxfpg1sg:INV] happens to different kind of SQLs running in PARALLEL
    So far, it happened to insert /*+ append */, CTAS, SELECTs …

    DIAGNOSTIC ANALYSIS:
    ——————–
    We actually have a lot of ORA-600 [4400] before we hit ORa-600 [kxfpg1sg:INV]

    Fri Dec 1 03:17:12 2006
    Errors in file
    /vol101/oraprod/granite/admin/bdump/granite_p007_29114.trc:
    ORA-600: internal error code, arguments: [4400], [48], [], [], [], [],
    [], []
    ORA-10387: parallel query server interrupt (normal)
    Fri Dec 1 03:20:47 2006
    Private_strands 198 at log switch
    ………………

    Fri Dec 1 03:23:50 2006
    Errors in file
    /vol101/oraprod/granite/admin/bdump/granite_j035_12597.trc:
    ORA-600: internal error code, arguments: [kxfpg1sg:INV],
    [0xFFFFFFFF79CC70C8], [], [], [], [],
    [], []
    Fri Dec 1 03:25:34 2006
    Completed checkpoint up to RBA [0x58aec.2.10], SCN: 16957015675

    We thought that this could be related to bug 5685731. However, the process
    that fails with OERI:[kxfpg1sg:INV] does use DB LINKs

    last wait for ‘process startup’ blocking sess=0x0 seq=2388 wait_time=654054
    type=50, process#=3aa, waited=7e1

    WORKAROUND:
    ———–
    none so far

    RELATED BUGS:
    ————-
    Bug 2797383 – ORA-600 [kxfpg1sg:INV] can occur during slave creation.

    REPRODUCIBILITY:
    —————-
    at customer site only, but not at will

    TEST CASE:
    ———-
    not available

    STACK TRACE:
    ————
    kxfpg1sg
    kxfpgsg
    kxfrAllocSlaves
    kxfrialo
    kxfralo
    qerpxStart

    SUPPORTING INFORMATION:
    ———————–

    24 HOUR CONTACT INFORMATION FOR P1 BUGS:
    —————————————-
    gathering info from customer

    DIAL-IN INFORMATION:
    ——————–
    OWC

    IMPACT DATE:
    ————
    asap

    These are executive reporting reports which we are not able to produce
    due to these errors, these reports are very crucial reports being
    looked by executives on daily basis, this is the reason we want
    to get this resolved immediately, hence we have requested to
    upgrade to sev 1

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号