ORA-00600:[1112]内部错误&ROW CACHE ENQUEUE LOCK一例

一套AIX 上的9.2.0.6 2节点RAC系统出现了ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []内部错误伴随有ROW CACHE ENQUEUE LOCK并引发clusterware split-brain resolution,详细的日志及ass.awk输出如下:

 

ALERT LOG
=============
Sun Jun 19 09:06:24 2011
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=24
Sun Jun 19 09:06:29 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:06:29 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:06:30 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:06:30 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:06:31 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:06:31 2011
Errors in file /s01/admin/prod/udump/prod2_ora_1061088.trc:
ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Sun Jun 19 09:08:06 2011
Waiting for clusterware split-brain resolution
Sun Jun 19 09:13:17 2011
ALTER SYSTEM SET event='10511 trace name context forever, level 1' SCOPE=SPFILE SID='*';
Sun Jun 19 09:14:44 2011
Trace dumping is performing id=[cdmp_20110619091444]
Sun Jun 19 09:18:05 2011
Errors in file /s01/admin/prod/bdump/prod2_lmon_422072.trc:
ORA-29740: evicted by member 1, group incarnation 9
Sun Jun 19 09:18:05 2011
LMON: terminating instance due to error 29740
Sun Jun 19 09:18:05 2011
Errors in file /s01/admin/prod/bdump/prod2_lms2_725312.trc:
ORA-29740: evicted by member , group incarnation
Sun Jun 19 09:18:05 2011
Errors in file /s01/admin/prod/bdump/prod2_lms7_1008288.trc:
ORA-29740: evicted by member , group incarnation
Instance terminated by LMON, pid = 422072
Sun Jun 19 09:21:16 2011
Starting ORACLE instance (normal)

TRACE FILE
==============
prod2_ora_1061088.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
ORACLE_HOME = /oracle/app/oracle/product/9.2
System name: AIX
Node name: tprod2
Release: 3
Version: 5
Machine: 00CE5E834C00
Instance name: prod2

*** 2011-06-19 09:06:28.931
================================
PROCESS DUMP FROM HANG ANALYZER:
================================
Current SQL statement for this session:
SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS ORDER BY formatid, globalid, branchid
*** 2011-06-19 09:06:28.931
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?

Repeat 2 times
----- End of Call Stack Trace -----
*** 2011-06-19 09:06:29.111
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?
----- End of Call Stack Trace -----
*** 2011-06-19 09:06:29.133
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?
----- End of Call Stack Trace -----
*** 2011-06-19 09:06:29.162
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?
----- End of Call Stack Trace -----
*** 2011-06-19 09:06:29.175
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?
----- End of Call Stack Trace -----
*** 2011-06-19 09:06:29.192
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedms+00dc bl ksedst 102905E64 ?
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl nsdo 1102A8098 ? 5500000055 ?
1102DFD20 ? 1102A8200 ?
FFFFFFFFFFFC4E0 ? 000000000 ?
300000003 ?
opikndf2+06a8 bl _ptrgl
opitsk+05fc bl _ptrgl
opiino+0798 bl opitsk 000000000 ? 000000000 ?
opiodr+08e8 bl _ptrgl
opidrv+032c bl opiodr 3C00000018 ? 4101F62A0 ?
FFFFFFFFFFFF8C0 ? 0A057DC60 ?
sou2o+0028 bl opidrv 3C0C000000 ? 4A0644B50 ?
FFFFFFFFFFFF8C0 ?
main+0138 bl 01FD7B5C
__start+0098 bl main 000000000 ? 000000000 ?
----- End of Call Stack Trace -----
Files currently opened by this process:
===================================================
PROCESS STATE
-------------
Process global information:
process: 700000676099520, call: 0, xact: 0, curses: 0, usrses: 700000673decd98
----------------------------------------
SO: 700000676099520, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=224, calls cur/top: 0/7000006c2ca3df8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 50
last post received-location: kcbzww
last process to post me: 700000676119f00 7 0
last post sent: 0 0 21
last post sent-location: ksqrcl
last process posted by me: 700000676428258 1 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 700000676cc19b0
O/S info: user: oracle, term: UNKNOWN, ospid: 1061088
OSD pid info: Unix process pid: 1061088, image: oracle@tprod2 (TNS V1-V3)
----------------------------------------

END OF PROCESS STATE
******************** Cursor Dump ************************
Current cursor: 2, pgadep: 0
pgactx: 7000006f8bc2d40 ctxcbk: 0 ctxqbc: 0 ctxrws: 700000716aecfd0
Explain plan:
Plan Table
--------
-------------------------------------------------------------------------------------------------------------------------
| Operation | Name | Rows | Bytes | Cost | TQ |IN-OUT| PQ Distrib |Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| SELECT STATEMENT | | 0 | 0 | 0 | | | | | |
| SORT ORDER BY | | 0 | 0 | 0 | | | | | |
| VIEW | | 0 | 0 | 0 | | | | | |
| SORT UNIQUE | | 0 | 0 | 0 | | | | | |
| UNION-ALL | | 0 | 0 | 0 | | | | | |
| MINUS | | 0 | 0 | 0 | | | | | |
| SORT UNIQUE | | 0 | 0 | 0 | | | | | |
| VIEW | | 0 | 0 | 0 | | | | | |
| FIXED TABLE FULL | X$K2GTE2 | 0 | 0 | 0 | | | | | |
| SORT UNIQUE | | 0 | 0 | 0 | | | | | |
| NESTED LOOPS | | 0 | 0 | 0 | | | | | |
*** 2011-06-19 09:06:29.376
ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [1112], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS ORDER BY formatid, globalid, branchid
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp+0148 bl ksedst 102905C84 ?
ksfdmp+0018 bl 01FD8148
kgeriv+0118 bl _ptrgl
kgesiv+0080 bl kgeriv 07FFFFFFC ? 800000000000000 ?
1000000000000000 ?
1800000000000000 ?
028828228 ?
ksesic0+005c bl kgesiv 7000006BE3BB328 ? 000010550 ?
7000006BE3AADD8 ? 10297D7E8 ?
FFFFFFFFFFF3A20 ?
kssadf_stage+0084 bl ksesic0 45800000458 ? 11007A2F8 ?
000000000 ? 000000000 ?
000000000 ? 70000000001DB80 ?
000000000 ? 700000703BBF040 ?
kqreqa+008c bl kssadf_stage 7000006BE3AADD8 ? 10297D7E8 ?
068A31055 ? 000006BB0 ?
000000001 ?
kqrpre1+06e4 bl kqreqa 000000001 ?
kqrpre+001c bl kqrpre1 BAC3F8E66 ? 000000001 ?
FFFFFFFFFFF4008 ? 1101F9A14 ?
1101F9A14 ? FFFFFFFFFFF4000 ?
07FFFFFFF ? 000000000 ?
kkdlobni+0058 bl kqrpre 100F29A04 ?
4222442400000000 ?
14DFD4B95 ?
166CCD19101F62A0 ?
000000002 ? 000000000 ?
FFFFFFFFFFF40C0 ?
xplObjnToName+0150 bl kkdlobni 9A0000009A ?
FFFFFFFFFFF4444 ? 000000000 ?
000000000 ?
xplPatchName+00a4 bl xplObjnToName 9AFFFF46F0 ?
FFFFFFFFFFF4444 ?
xplMakeRow+0190 bl xplPatchName 000000000 ? 000000000 ?
000000000 ?
xplFetchRow+00b4 bl _ptrgl
xplDumpRws+0604 bl xplFetchRow 1029CFB48 ? FFFFFFFFFFF4770 ?
1101F9A14 ?
curdmp+0164 bl xplDumpRws 102AE2A20 ?
ksedms+012c bl curdmp
ksdxfdmp+0200 bl _ptrgl
ksdxcb+02d8 bl _ptrgl
sspuser+0084 bl 01FD7CA8
000044C0 ? 00000000
snttread+0028 bl 00009CFC
nttrd+0118 bl snttread FFFFFFFFFFFBBB3 ?
FFFFFFFFFFFBBA8 ?
FFFFFFFFFFFB2C0 ?
nsprecv+0984 bl _ptrgl
nsrdr+01d0 bl nsprecv 000000000 ? 110299C00 ?
000000000 ?
nsdo+1818 bl nsrdr 000000000 ? 000000000 ?
nioqrc+05c4 bl

Blockers
~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate. (The holder may have released the resource before we
dumped the state object tree of the blocking process).
o Lines with 'Enqueue conversion' below can be ignored *unless*
other sessions are waiting on that resource too. For more, see
http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)

Resource Holder State
Latch 70000000000a4b8 115: Blocker
Latch 70000000000a4b8 210: Blocker
Latch 70000000000a4b8 270: Blocker
Latch 70000000000a4b8 406: Blocker
Latch 70000000000a4b8 614: Blocker
Latch 70000000000a4b8 626: Blocker
Latch 70000000000a4b8 882: Blocker
Latch 70000000000a4b8 1489: Blocker
Latch 70000000000a4b8 1617: Blocker
Latch 70000000000a4b8 1878: Blocker
Latch 70000000000a4b8 1916: Blocker
Latch 70000000000a4b8 1947: Blocker
Latch 70000000000a4b8 1963: Blocker
Latch 70000000000a4b8 2121: 2121: is waiting for Latch 700000675dae330
Latch 70000000000a4b8 2245: Blocker
Latch 70000000000a4b8 2351: Blocker
Latch 70000000000a4b8 2566: Blocker
Latch 70000000000a4b8 2585: Blocker
Latch 70000000000a4b8 2643: Blocker
Latch 70000000000a4b8 2773: 2773: is waiting for Latch 700000675daf3a8
Latch 70000000000a4b8 2791: Blocker
Latch 70000000000a4b8 2795: Blocker
Latch 70000000000a4b8 2966: Blocker
Latch 70000000000a4b8 2969: Blocker
Latch 700000675dadf50 ??? Blocker
Latch 700000675dadc68 ??? Blocker
Latch 700000675dadb70 ??? Blocker
Latch 7000006be3a6530 ??? Blocker
Latch 700000675dae808 ??? Blocker
Latch 700000675db0040 ??? Blocker
Latch 7000006d1d71138 ??? Blocker
Latch 700000675dad3b0 ??? Blocker
Latch 700000675dae330 ??? Blocker
Latch 7000006b2d4fd28 2211: Blocker
Latch 7000006b2d4fd28 2220: Blocker
Latch 7000006b2e5df68 2660: Blocker
Latch 7000006b2e5e3e8 2752: Blocker
Latch 7000006b2e5e3e8 2876: Blocker
Latch 7000006b2d06b28 ??? Blocker
Latch 7000006b2f9f928 ??? Blocker
Latch 7000006b2d4db68 ??? Blocker
Latch 7000006b2e5e868 ??? Blocker
Latch 7000006b2d4e6a8 ??? Blocker
Latch 7000006b2d4eb28 2434: Blocker
Latch 7000006b2d4eb28 2437: 2437: is waiting for 2434: 2437:
Latch 7000006b2d4f428 2925: Blocker
Latch 7000006b2d4f428 2948: Blocker
Latch 7000006b2d07428 ??? Blocker
Latch 7000006b2d4e588 ??? Blocker
Latch 7000006b2e5ece8 ??? Blocker
Latch 7000006b2d4efa8 ??? Blocker
Latch 7000006b2d07c08 ??? Blocker
Latch 7000006b2f9e968 ??? Blocker
Latch 700000675daf3a8 ??? Blocker
Latch 7000006b2a49f68 3198: Blocker
Latch 70000000001a968 ??? Blocker

Some of the above latches may be child latches. Please check the section
named 'Child Latch Report' below for further notes.

Blockers According to Tracefile Wait Info:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. This may not work for 64bit platforms. See bug 2902997 for details.
2. If the blocking process is shown as 0 then that session may no longer be
present.
3. If resources are held across code layers then sometimes the tracefile wait
info will not recognise the problem.

No blockers seen.

Object Names
~~~~~~~~~~~~
Latch 70000000000a4b8 enqueues
Latch 700000675dadf50 Child enqueue hash chains
Latch 700000675dadc68 Child enqueue hash chains
Latch 700000675dadb70 Child enqueue hash chains
Latch 7000006be3a6530 Child row cache objects
Latch 700000675dae808 Child enqueue hash chains
Latch 700000675db0040 Child enqueue hash chains
Latch 7000006d1d71138 Child library cache pin
Latch 700000675dad3b0 Child enqueue hash chains
Latch 700000675dae330 Child enqueue hash chains
Latch 7000006b2d4fd28 Child cache buffers chains
Latch 7000006b2e5df68 Child cache buffers chains
Latch 7000006b2e5e3e8 Child cache buffers chains
Latch 7000006b2d06b28 Child cache buffers chains
Latch 7000006b2f9f928 Child cache buffers chains
Latch 7000006b2d4db68 Child cache buffers chains
Latch 7000006b2e5e868 Child cache buffers chains
Latch 7000006b2d4e6a8 Child cache buffers chains
Latch 7000006b2d4eb28 Child cache buffers chains
Latch 7000006b2d4f428 Child cache buffers chains
Latch 7000006b2d07428 Child cache buffers chains
Latch 7000006b2d4e588 Child cache buffers chains
Latch 7000006b2e5ece8 Child cache buffers chains
Latch 7000006b2d4efa8 Child cache buffers chains
Latch 7000006b2d07c08 Child cache buffers chains
Latch 7000006b2f9e968 Child cache buffers chains
Latch 700000675daf3a8 Child enqueue hash chains
Latch 7000006b2a49f68 Child cache buffers chains
Latch 70000000001a968 Parent transaction allocation

Child Latch Report
~~~~~~~~~~~~~~~~~~
Some processes are being blocked waiting for child latches.

At the moment this script does not detect the blocker because the
child latch address differs to the parent latch address. To manually
detect the blocker please take the following steps :
1. Determine the TYPE of latch (Eg library cache) that is involved.
2. Search the source trace file for a target of :
holding.*Parent.*library cache
(Assuming we have a child library cache and have vi-like regular expressions)

If this shows nothing then the blocker may have released the resource
before we got to dump the state object tree of the blocked process.

A list of processes that hold parent latches is given below :

No processes found.

Summary of Wait Events Seen (count>10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
No wait events seen more than 10 times

 

 

ORA-00600:[1112]内部错误的相关知识如下:

 

ERROR:
ORA-600 [1112] [a] [b] [c] [d] [e]

VERSIONS:
versions 7.3 to 9.2

DESCRIPTION:

ORA-600 [1112] is getting raised while trying to add a
row cache enqueue to a transaction state object during
lookup of the default tablespace number during table
creation.

FUNCTIONALITY:
STATE OBJECT MANAGEMENT

IMPACT:
PROCESS FAILURE
NON CORRUPTIVE – No underlying data corruption.

Bug 2489130 – OERI:1112 can occur while dumping PROCESSSTATE informatio (Doc ID 2489130.8)
Bug 4126973: ORA-600[504] AND ORA-600[1112] OCCURED WHEN GETTING “ERRORSTACK”
Base Bug 2489130
Bug 3954753: ORA-600 [1112] AND SESSION CRASH

 

经过诊断发现该ORA-00600:[1112]内部错误是由Bug 2489130所引起的,而触发该Bug的直接原因是WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!:

 

The cause for the ORA-00600 [1112] appears due to Bug 2489130
This error can occur on dumping of process state which is what occurred here.
The primary issue is the WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!
This then triggers a system state and process state to be dumped due to nature of the problem.
The ORA-00600 [1112] gets dumped out when process state is done.

Stack for trace very similar to Bug 2489130 and this is only known bug on 9.2 like this with a fix.

A fix for bug 2489130 is included in the 9.2.0.7 patchset.
Recommend applying 9.2.0.8 patchset to have this and other bug fixes.
This would only prevent the ORA-00600 [1112] from occurring on state dumps.

 

解决方案是 优化SQL性能以避免出现WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!, 或者至少升级数据库版本到9.2.0.8 这个推荐的patchset。

ORA-00600

ORA-00600 Internal Error 是我们在学习使用Oracle的过程中,必然会经历的一个站点。

很多同学一遇到ORA-00600 错误信息,就认为自己碰到了Oracle Database软件的Bug,实际上这一观点是不准确的。

ORA-00600可能由多种原因造成,包括软件漏洞、Bug、程序运行异常、内存讹误和数据讹误造成。

举例来说在数据异常恢复过程中常遇到的ORA-00600[2662](Block SCN is ahead of Current SCN) 和ORA-00600[4000](回滚段rollback数据块时发现rollback segment存在讹误)错误 均是数据讹误引起的而非bug 。

我们在分析ORA-00600 Internal Error, 定位具体故障的时候,从600 trace中能够找到的最为有用的信息就是600所附带的Argument信息:

 

实际600 Internal Error 的Argument 可以分成 2种:

 

a.   第一位是数字类型的Argument , 例如之前说的2662 和 4000 , 不同的数字代表不同的错误含义。 数字类型的argument 所代表的内部错误相对更为普遍、常见。  实际这些数字Argument 也是来源于 不同的Oracle Kernel Function内核函数,如kddummy_blkchk、kclchkinteg_2 等; 但是因为这些错误较为常见, 一方面为了照顾用户的使用体验( 用户对RDBMS软件的内核函数是不感兴趣的,当然可能我们感兴趣), 另一方面这些函数涉及到很多Oracle的内部原理,为了不让这些内核函数暴露在外, 所以Oracle开发部门对这些常见的Internal Error状态进行了编码,转换成数字代码的形式, 实际上这些数字代码形式的Argument 都有其与OERR类似的注释,这些注释没有被包含在oraus.msg中,但是在该msg文件中说明了这些注释仅仅是不公开, Oracle公司的员工是可以看到的:

 

 Programmer's Comments
 ---------------------
 If you wish to add comments regarding a message that should not be seen by
 the public, use "// *Comment: " as follows:

   e.g.
       32769, 00000, "incompatible SQL*Net version"
        *Cause: An attempt was made to use an older version of SQL*Net that
         is incompatible with current version of ORACLE.

 

数字编码Argument 的Internal error 如果不只打印出一位的Argument的话,那么后续几位的Argument 一般都是有其实际意义的,如ORA-00600[2662]的后续Argument 的含义为:

ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

这就便于Oracle Support 来诊断和解决这些Internal Error。对于数字类型的Argument ,Metalink上一般会公开其后续Argument的含义,且因为这些问题较为常见,所以一般都已经提供专门的Resolved Solution 或者 Workaround 方法来提供。

总而言之数字编码的ORA-00600 argument 一般我们可以通过 在Metalink 上搜索 ORA-00600 + 第一位 Argument ,或者使用<ORA-600/ORA-7445 Error Look-up Tool [ID 153788.1]>诊断工具页面来找到相关的有用Note。

 

b. 函数名形式的Argument 。 这类Argument 代表的Internal Error 相对于前一种要出现的频率低一些, Oracle开发部门尚来没有在相关版本中将这些Internal Error 编码。 这样我们就可以看到出现问题的完整Kernel Function Name , 可以使用ORA-600 + 第一位 Argument 在Metalink 上搜索来找到一些相关的Note , 但是函数名形式的Argument  往往不能精确定位到问题 ,因为 不同的错误原因 可能在同一个内核函数中引发不同的异常 , 而这个时候我们只能看到 函数名的Argument 信息。 更精确定位的 方式是找出 在调用这个函数时的 详细stack call , 我们来看一个ORA-600[KCBZ_CHECK_OBJD_TYP_1]的stack call:

 

ksedst()+40
ksedmp()+168
ksfdmp()+32
kgerinv()+152
kgeasnmierr()+88
kcbassertbd3()+204
kcbz_check_objd_typ
kcbzib()+
kcbgtcr()+
ktecgsc()+168
ktecgetsh()+196
ktecgshx()+40
kteinicnt1()+648
ktssdrbm_segment()+
ktssdro_segment()+3
ktssdt_segs()+1128
ktmmon()+3500
ktmSmonMain()+64
ksbrdp()+1276
opirip()+
opidrv()+1088
sou2o()+120
opimai_real()+496
main()+240
$START$()+

 

注意以上stack call中 只有 ktmSmonMain -> kcbassertbd3 这部分是有意义的, 开始部分的main()-> ksbrdp() 是很普通的入口函数 , 而从kgeasnmierr (Kernel generic Error ) 开始的代码是Oracle 报错层使用的函数 , 都是对定位问题没有帮助的。 将这部分有用的stack call 填入Metalink <ORA-600/ORA-7445 Error Look-up Tool [ID 153788.1]> 600问题诊断页面的 stack call 栏 会以较严格的筛选条件找出问题相关的Note:

 

 

针对ORA-00600 的解决 一般 Oracle Support 会给出  补丁修复 和 Workaround 绕过该问题的 2 类解决方案 , 当然也还是存在Oracle 研发部门无法在他们的环境中重现你所遇到的ORA-00600的可能性,这意味着部分600错误可能是官方无解的,也可能是Oracle Support 已经掌握某种Workaround 的方法, 但是没有在现有的Note 文档中提交的情况 , 当然这都是少数现象。

 

如果实在找不到可用的解决方案, 或者您的产品数据库有极高的服务等级要求,那么提交Service Request (SR) 有些老人还是习惯于称其为Tar的服务请求 , 可能是一种终极手段。 但是我不得不说一句 并非所有的问题 都是有解的 , 您使用的TV 电视机的制造商可以解决 所有其在使用环节中遇到的问题吗? 理论上是可以的 , 但是当解决一个问题的成本非常高时 , 制造商可能更情愿给你换一台电视 ,但是您的产品数据库 可以轻易更换吗?  这是一个值得深思的问题 , 也是RDBMS市场的 一条悖论。

 

 

 

来读读 由Maclean Liu 所编写的ORA-00600 Oracle Internal Error 的相关文章:

Oracle内部错误:ORA-00600:[4097]一例
Oracle内部错误:ORA-00600[15801], [1]一例
Oracle内部错误:ORA-00600:[6033]一例

Oracle内部错误:ORA-00600[OSDEP_INTERNAL]一例

Oracle内部错误:ORA-00600[kgskdecrstat1]一例

Oracle内部错误:ORA-00600[kfioTranslateIO03]一例

Oracle内部错误ORA-00600:[pfri.c: pfri8: plio mismatch ]一例

Oracle内部错误:ORA-00600[2608]一例

Oracle内部错误:ORA-00600[13013][5001]故障诊断一例

Oracle内部错误:ORA-00600[17175]一例

Oracle内部错误ORA-00600:[2667]一例

Oracle RAC内部错误:ORA-00600[keltnfy-ldmInit]一例

ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [729], [10992], [SPACE LEAK] Example

手工模拟Oracle数据块逻辑讹误引发,ORA-00600:[13013] [5001]一例

ORA-00600
[4400][48]错误一例

ORA-00600 [KCBZPB_1], [59033077], [4], [1], [] example

ORA-00600:[qctcte1]内部错误一例

ORA-00600
: internal error code, arguments: [15160]

ORA-00600
: internal error code, arguments: [kdsgrp1] example

Oracle内部错误:ORA-00600[25012]一例

ora-00600
:[17281], [1001]一例

ORA-00600
:[kclchkinteg_2]及[kjmsm_epc]内部错误一例

Oracle内部错误:ORA-00600[kccchb_3]一例

ORA-00600
: [qksrcBuildRwo]内部错误一例

ORA-00600
:[32695], [hash aggregation can’t be done]错误一例

ORA-00600
[6711]错误一例

ora-00600
[kkocxj:pjpCtx]内部错误一例

ORA-00600
[kcbz_check_objd_typ_3]错误一例

ORA-00600
:[15570]内部错误一例

ORA-00600
[3756]内部错误一例

ORA-00600
[kddummy_blkchk]错误一例

How to trigger ORA-00600,ORA-7445 by manual

ora-600 [17182]错误一例

Database Force open example

ora-600[qesmmCValStat4]一例

ORA-600 [kddummy_blkchk] [18038] 一例

Oracle内部错误:ORA-00600[kgskdecrstat1]一例

famous summary stack trace from Oracle Version 8.1.7.4.0 Bug Note

Oracle内部错误ORA-600:[1112]

一次Exadata上的ORA-600[kjbmprlst:shadow]故障分析
ORA-600 quick reference guide
ORA-00600[kglhdunp2_2]错误一例
ORA-00600:[kclchkinteg_2]及[kjmsm_epc]内部错误一例
ORA-00600: [7005], [192]内部错误一例
ORA-600 internal error[kqrfrpo]一例
ORA-600[4194]错误一例
How to trigger ORA-00600,ORA-7445 by manual
ORA-00600[kjpsod1]&ORA-44203错误一例


Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例

一套Linux x86-64上的11.2.0.1 4节点RAC系统中LMS GCS服务进程遭遇到内部错误ORA-00600[kjbmprlst:shadow],导致节点实例意外终止,具体日志如下:

Fri Jul 08 02:04:43 2011
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc  (incident=1011732):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD1/incident/incdir_1011732/PROD1_lms1_536_i1011732.trc
Fri Jul 08 02:04:44 2011
Trace dumping is performing id=[cdmp_20110708020444]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 536): terminating the instance due to error 484
Fri Jul 08 02:04:45 2011
opiodr aborting process unknown ospid (27387) as a result of ORA-1092
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_diag_513.trc
Fri Jul 08 02:04:54 2011
Termination issued to instance processes. Waiting for the processes to exit
Fri Jul 08 02:04:58 2011
ORA-1092 : opitsk aborting process

该ORA-00600[kjbmprlst:shadow]错误定位为11.2.0.1上的Bug 10121589或Bug 9458781:

Bug 10121589  ORA-600 [kjbmprlst:shadow] can occur in RAC
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions BELOW 12.1
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        12.1 (Future Release)
        11.2.0.2 Bundle Patch 2 for Exadata Database
        11.2.0.1 Bundle Patch 7 for Exadata Database 

Symptoms:

Related To:

    Internal Error May Occur (ORA-600)
    ORA-600 [kjbmprlst:shadow] 

    RAC (Real Application Clusters) / OPS 

Description

    An ORA-600 [kjbmprlst:shadow] can occur if the fix for bug 9979039
    is present.

    Note:
     One off patches for 10200390 should also include this fix.

Bug 9458781  Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error

Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions >= 11.2.0.1 but BELOW 11.2.0.2
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        11.2.0.2 (Server Patch Set)
        11.2.0.1 Bundle Patch 4 for Exadata Database 

Symptoms:

Related To:

    Instance May Crash
    Internal Error May Occur (ORA-600)
    ORA-600 [KJBMPRLST:SHADOW]
    ORA-600 [KJBMOCVT:RID]
    ORA-600 [KJBRREF:PKEY]
    ORA-600 [KJBRASR:PKEY] 

    RAC (Real Application Clusters) / OPS 

Description

    A lock is closed without sending a message to the master.
    This causes closed lock dangling at the master crashing the instance with different internal errors.

    Reported internal errors so far are :
    - KJBMPRLST:SHADOW
    - KJBMOCVT:RID
    - KJBRREF:PKEY
    - KJBRASR:PKEY

该kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞,目前除了安装补丁外没有已验证的workaround办法(disable drm似乎是无效的):

oradebug lkdebug (track resources, take dumps)
KCL history
KJBL history
KJL history

PCM (GCS) and non-PCM (GES) resources are kept separate and use separate code paths.
GES:
Resource table: kjr and kjrt
Lock table: kjlt
Processes: kjpt
GCS:
Resource table: kjbr
Lock table: kjbl

DLM Structures (continued)
/* PCM resource structure */
typedef struct kjbr {                                /* 68 bytes on sun4u */
  kjsolk       hash_q_kjbr;                             /* hash list : hp */
  ub4          resname_kjbr[2];	                     /* the resource name */
  kjsolk       scan_q_kjbr; /* chain to lmd scan q of grantable resources */
  kjsolk       grant_q_kjbr;                 /* list of granted resources */
  kjsolk       convert_q_kjbr;       /* list of resources being converted */
  ub4          diskscn_bas_kjbr;         /* scn(base) known to be on disk */
  ub2          diskscn_wrap_kjbr;        /* scn(wrap) known to be on disk */
  ub2          writereqscn_wrap_kjbr;    /* scn(wrap) requested for write */
  ub4          writereqscn_bas_kjbr;     /* scn(base) requested for write */
  struct kjbl *sender_kjbr;                 /* lock elected to send block */
  ub2          senderver_kjbr;                  /* version# of above lock */
  ub2          writerver_kjbr;                  /* version# of lock below */
  struct kjbl *writer_kjbr;                /* lock elected to write block */
  ub1          mode_role_kjbr; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1          flags_kjbr;                        /* ignorewip, free etc. */
  ub1          rfpcount_kjbr;                      /* refuse ping counter */
  ub1          history_kjbr;                /* resource operation history */
  kxid         xid_kjbr;                          /* split transaction ID */
} kjbr ;

/* kjbl - PCM lock structure
** Clients and most of the DLM will use the KJUSER* or KJ_* modes and kscns  */

typedef struct kjbl {                                /* 52 bytes on sun4u */
  union {                     /* discriminate lock@master and lock@client */
    struct {                                           /* for lock@master */
      kgglk        state_q_kjbl;             /* link to chain to resource */
      kjbopqi     *rqinfo_kjbl;                             /* target bid */
      struct kjbr *resp_kjbl;                   /* pointer to my resource */
    } kjbllam;                                 /* KJB Lock Lock At Master */
    struct {                                           /* for lock@client */
      ub4         disk_base_kjbl;        /* disk version(base) for replay */
      ub2         disk_wrap_kjbl;        /* disk version(wrap) for replay */
      ub1         master_node_kjbl;                   /* master instance# */
      ub1         client_flag_kjbl;     /* flags specific to client locks */
      ub2         update_seq_kjbl;               /* last update to master */
    } kjbllac;                                 /* KJB Lock Lock At Client */
  } kjblmcd;                        /* KJB Lock Master Client Discrimnant */
  void  *remote_lockp_kjbl;           /* pointer to client lock or shadow */
  ub2    remote_ver_kjbl;                         /* remote lock version# */
  ub2        ver_kjbl;                                     /* my version# */
  ub2        msg_seq_kjbl;                         /* client->master seq# */
  ub2        reqid_kjbl;                         /* requestid for convert */
  ub2        creqid_kjbl; /* requestid for convert that has been cancelled */
  ub2        pi_wrap_kjbl;                     /* scn(wrap) of highest pi */
  ub4        pi_base_kjbl;                     /* scn(base) of highest pi */
  ub1        mode_role_kjbl; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1        state_kjbl;       /* _L|_R|_W|_S, notify, which q, lock type */
  ub1        node_kjbl;                       /* instance lock belongs to */
  ub1        flags_kjbl;                                /* lock flag bits */
  ub2        rreqid_kjbl;                               /* save the reqid */
  ub2         write_wrap_kjbl;        /* last write request version(wrap) */
  ub4         write_base_kjbl;        /* last write request version(base) */
  ub4         history_kjbl;                     /* lock operation history */
} kjbl;

PCM DLM locks that are owned by the local instance are allocated and embedded in an LE structure.
PCM DLM locks that are owned by remote instances and mastered by the local instance are allocated in SHARED_POOL.

PCM Locks and Resources
Fields of interest in the kclle structure: kcllerls or releasing; kcllelnm or name(id1,id2);
kcllemode or held-mode; kclleacq or acquiring; kcllelck or DLM lock.

Fields of interest in the kjbr structure: resname_kjbr[2] or resource name; grant_q_kjbr or grant queue;
convert_q_kjbr or convert queue; mode_role_kjbr, which is a bitwise merge of grant mode and
role-interpreted NULL(0x00), S(0x01), X(0x02), L0 Local (0x00), G0 Global without PI (0x08), G1 Global with PI (0x018).

The field mode_role_kjbl in kjbl is a bitwise merge of grant, request, and lock mode: 0x00 if grant NULL;
0x01 if grant S; 0x02 if grant X; 0x04 lock has been opened at master; 0x08 if global role (otherwise local);
0x10 has one or more PI; 0x20 if request CR; 0x40 if request S; 0x80 if request X.

Someone has to keep a list of all buffers and where they are mastered
This is called Global Resource Directory (GRD)
GRD is present on all the instances of the cluster
To find out the master:
select  b.dbablk, r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj =
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp

Oracle Support宣称可以通过11.2.0.2 (Server Patch Set)11.2.0.1 Bundle Patch 4 for Exadata Database修复该bug,但是有迹象表明在11.2.0.2上仍可能发生该ORA-00600[kjbmprlst:shadow]内部错误,同时该bug更多地发生在超过2个节点的RAC系统中。

 

Oracle内部错误:ORA-00600[kfioTranslateIO03]一例

一套Linux x86-64上的11.2.0.2 RAC+ASM系统,其中一个节点出现了ORA-00600[kfioTranslateIO03]内部错误,其具体日志如下:

=============================alert.log===============================
adrci> show alert -tail -f 

2011-05-30 20:29:12.657000 +08:00
Starting background process RSMN
RSMN started with pid=31, OS id=22084
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /s01/orabase
ALTER DATABASE MOUNT /* db agent *//* {0:7:3} */
This instance was first to mount
2011-05-30 20:29:15.026000 +08:00
Sweep [inc][100831]: completed
Sweep [inc2][100831]: completed
NOTE: Loaded library: System
ORA-15025: could not open disk "/dev/raw/raw1"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw3"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw5"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
SUCCESS: diskgroup DATA was mounted
NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22056.trc  (incident=104831):

ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], [] 

Incident details in: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_104831/PROD1_ckpt_22056_i104831.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

adrci> show problem 

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************
PROBLEM_ID           PROBLEM_KEY                                                 LAST_INCIDENT
-------------------- ----------------------------------------------------------- --------------------
2                    ORA 7445 [kghdmp_new()+1133]                                18387
3                    ORA 7445 [kghfnd()+2672]                                    20701
5                    ORA 7445 [kcldmp()+246]                                     28229
6                    ORA 7445 [kclxle()+311]                                     28230
1                    ORA 4031                                                    56918
4                    ORA 445                                                     90278
7                    ORA 600 [kfioTranslateIO03]                                 108831

adrci> show incident -mode detail -p "incident_id=108831"

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
   INCIDENT_ID                   108831
   STATUS                        ready
   CREATE_TIME                   2011-05-30 20:31:55.484000 +08:00
   PROBLEM_ID                    7
   CLOSE_TIME
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                ORA
   ERROR_NUMBER                  600
   ERROR_ARG1                    kfioTranslateIO03
   ERROR_ARG2
   ERROR_ARG3
   ERROR_ARG4
   ERROR_ARG5
   ERROR_ARG6
   ERROR_ARG7
   ERROR_ARG8
   ERROR_ARG9
   ERROR_ARG10
   ERROR_ARG11
   ERROR_ARG12
   SIGNALLING_COMPONENT          ASM
   SIGNALLING_SUBCOMPONENT
   SUSPECT_COMPONENT
   SUSPECT_SUBCOMPONENT
   ECID
   IMPACTS                       0
   PROBLEM_KEY                   ORA 600 [kfioTranslateIO03]
   FIRST_INCIDENT                96831
   FIRSTINC_TIME                 2011-05-30 20:24:40.372000 +08:00
   LAST_INCIDENT                 108831
   LASTINC_TIME                  2011-05-30 20:31:55.484000 +08:00
   IMPACT1                       0
   IMPACT2                       0
   IMPACT3                       0
   IMPACT4                       0
   KEY_NAME                      ProcId
   KEY_VALUE                     19.1
   KEY_NAME                      Client ProcId
   KEY_VALUE                     oracle@rh2.oracle.com.22504_139763918456544
   KEY_NAME                      SID
   KEY_VALUE                     397.1
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
1 rows fetched

===================================trace===================================

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc

Dump continued from file: /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 108831 (ORA 600 [kfioTranslateIO03]) ========
----- Beginning of Customized Incident Dump(s) -----
kfioRqSet=0x7f1d524151c0 parent=0x7fffb2642d30 gn=(64.0) cnt=0
  size=32768 vxn=0 byte offset=16384 buf offset=0
  tried[0]=0 tried[1]=0 tried[2]=0 tried[3]=0 tried[4]=0 tried[5]=0
  skipped[0]=0 skipped[1]=0 skipped[2]=0 skipped[3]=0 skipped[4]=0 skipped[5]=0
parent :
DDE: Ending a split invocation on error recording!
----- End of Customized Incident Dump(s) -----

*** 2011-05-30 20:31:55.548
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkedDefDump()+2741  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexExplicitEndInc  call     dbgexPhaseII()       7F1D5281F710 ? 7F1D52822500 ?
()+750                                             7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndDDEInvocatio  call     dbgexExplicitEndInc  7F1D5281F710 ? 7F1D52822500 ?
nImpl()+767                   ()                   7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndSpltInvokOnR  call     dbgeEndDDEInvocatio  7F1D5281F710 ? 7F1D52822500 ?
ec()+265                      nImpl()              7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgePostErrorKGE()+  call     dbgeEndSpltInvokOnR  7F1D5281F710 ? 7F1D52822500 ?
248                           ec()                 7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F1D52830E40 ?
63                                                 000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgeade()+351         call     dbkePostKGE_kgsf()   00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgerelv()+135        call     kgeade()             00B7C8EA0 ? 00B7C9050 ?
                                                   7F1D52830E40 ? 000003AE9 ?
                                                   100000000 ? 000000002 ?
kserecl0()+157       call     kgerelv()            00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 00952980C ?
                                                   7FFFB2641C10 ? 000000000 ?
kfioErrorRecord()+7  call     kserecl0()           00B7C8EA0 ? 7F1D52830E40 ?
6                                                  000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfiorq_dump()+129    call     kfioErrorRecord()    7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioRqSetDump()+565  call     kfiorq_dump()        7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioTranslateIO()+3  call     kfioRqSetDump()      7F1D524151C0 ? 7F1D52830E40 ?
079                                                000003AE9 ? 000000005 ?
kfioRqSetPrepare()+  call     kfioTranslateIO()    7F1D524151C0 ? 7F1D52415098 ?
1017                                               7FFFB26421D4 ? 7FFFB26421D0 ?
                                                   0D4F338B0 ? 000000000 ?
kfioSubmitIO()+2852  call     kfioRqSetPrepare()   7F1D524151C0 ? 7F1D52415098 ?
                                                   7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequestPriv()+1  call     kfioSubmitIO()       7FFFB2642E10 ? 000000001 ?
94                                                 7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequest()+701    call     kfioRequestPriv()    000000000 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_kfioRequest()+  call     kfioRequest()        7FFFB2642E10 ? 000000001 ?
644                                                7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
ksfd_osmio()+1050    call     ksfd_kfioRequest()   7FFFB2642E10 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_io()+2717       call     ksfd_osmio()         000000001 ?
                                                   FFFFFFFFB2642D30 ?
                                                   FFFFFFFFB2642D30 ?
                                                   0D400A0B0 ? 000008000 ?
                                                   7FFFB2643170 ?
ksfdread()+576       call     ksfd_io()            0D400A0B0 ? 000000001 ?
                                                   7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify_file()  call     ksfdread()           0D400A0B0 ? 000000001 ?
+309                                               7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify()+225   call     kcc_identify_file()  0D400A0B0 ? 7F1D52417E00 ?
                                                   000000000 ? 060019450 ?
                                                   060019630 ? 0DAC34670 ?
kccida()+225         call     kcc_identify()       000000000 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbabs()+771         call     kccida()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbrdp()+971         call     ksbabs()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc

NOTE: disk 4 is missing from group 1
Incident 108831 created, dump file: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

=========Start of 'kfiorq = [0x7fffb2642d30]' dumping =========
        Status            =  UNKWOWN
        Flags             =  READ |  SYNC
        Mirror side       = 0
        Fib               = 0xd4f338b0
        Offset            = 1
        buffer ptr        = 0x7f1d52417e00
        Rcount            = 32768
        err_kfiorq        = 15081
        Inflight disk IO  = 0
        Completed disk IO = 0
        Oracle error      = 0
        Intended zone     = 48
  ===Dump of all attached kfiodrq's===
=========End of 'kfiorq = [0x7fffb2642d30]' dumping =========

parent :
############# kfiofib = 0xd4f338b0 #################
Diskgroup Name     =
File number        = 261.747100215
File type          = 1
Flags              = 10
Blksize            = 16384
File size          = 1131 blocks
Blk one offset     = 1
Redundancy         = 17
Physical blocksz   = 512
Open name          = +DATA/prod/controlfile/current.261.747100215
Fully-qualified nm =+DATA/prod/controlfile/current.261.747100215
Mapid             = 2
Slave ID          = -1
Connection        = 0x(nil)
############################################
Error ORA-600 signaled at ksedsts()+461<-ksf_short_stack()+77<-kge_snap_callstack()+63<-kge_sigtrace_dump()+69<-kgepop()+712<-kgersel()+175<-kfioTranslateIO()+3138<-kfi
oRqSetPrepare()+1022<-kfioSubmitIO()+2857<-kfioRequestPriv()+199<-kfioRequest()+706<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_iden
tify_file()+314<-kcc_identify()+230<-kccida()+230<-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 22504
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfioRequest()+2157<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_identify_file()+314<-kcc_identify()+230<-kccida()+230<
-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252
<-main()+201<-__libc_start_main()+244<-_start()+36 ----- End of Abridged Call Stack Trace ----- *** 2011-05-30 20:31:56.271 KSU: Terminating fatal process 'oracle@rh2.oracle.com (CKPT)' adrci> ips create package
Created package 2 without any contents, correlation level typical

adrci> ips add problem 7 package 2
Added problem 7 to package 2

adrci> ips finalize package 2
Finalized package 2

adrci> ips generate package 2 in /tmp
Generated package 2 in file /tmp/IPSPKG_20110531224208_COM_1.zip, mode complete

诊断发现由于ASM diskgroup磁盘组中的磁盘设备文件/dev/raw/raw*的权限被修改成了0600,而这些裸设备的拥有者为grid用户,导致oracle用户无法读写这些裸设备,通过将设备文件的权限修改为0660,解决了该问题。

Oracle内部错误ORA-00600:[2667]一例

一套Power AIX上的9.2.0.1系统在数据库打开过程中遇到ORA-00600:[2667]内部错误,详细日志如下:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

Wed Mar 9 19:03:38 2011
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 117699122479
Resetting resetlogs activation ID 2197911857 (0x83017931)
Wed Mar 9 19:03:47 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Assigning activation ID 2284878888 (0x88307c28)
Thread 1 opened at log sequence 1
Current log# 3 seq# 1 mem# 0: /s01/maclean/oradata/PROD/redo03.log
Successful open of redo thread 1.
Wed Mar 9 19:03:47 2011
SMON: enabling cache recovery
Wed Mar 9 19:03:47 2011
Errors in file /s01/maclean/admin/PROD/udump/PROD_ora_585914.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Wed Mar 9 19:03:48 2011
Errors in file /s01/maclean/admin/PROD/bdump/PROD_pmon_602286.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Wed Mar 9 19:03:50 2011
Errors in file /s01/maclean/admin/PROD/bdump/PROD_lgwr_548884.trc:
ORA-00600: internal error code, arguments: [2667], [1], [1], [3], [1739273391], [1739273391], [1739273391], [7267]
LGWR: terminating instance due to error 600

相关的初始化参数
fast_start_mttr_target = 300
_allow_resetlogs_corruption= TRUE
undo_management = AUTO
undo_tablespace = UNDOTBS1

以上可以看到lgwr关键进程在数据库open后几秒后遭遇了ORA-00600:[2667]内部错误后终止了实例。
该数据库在之前因为丢失当前日志文件进行了已经实施了一系列的非常规恢复操作,包括设置一系列的underscore参数:

Before I provide the steps to fix the ora-00600 error, I want to tell you that this database is opened with the unsupported parameter
"allow_resetlogs_corruption".

*************************************************************************
* By forcing open the database using this parameter, there is a strong *
* likelihood of logical corruption, possibly affecting the data *
* dictionary. Oracle does not guarantee that all of the data will be *
* accessible nor will it support a database that has been opened by *
* this method and that the database users will be allowed to continue *
* work. All this does is provide a way to get at the contents of the *
* database for extraction, usually by export. It is up to you to *
* determine the amount of lost data and to correct any logical *
* corruption issues. *
* *
*************************************************************************

2) The steps to get rid of the ora-00600 are as follows:

+ Change UNDO_MANAGEMENT=AUTO to

UNDO_MANAGEMENT=MANUAL

+ Remove or comment out UNDO_TABLESPACE and UNDO_RETENTION.

+ Add
_CORRUPTED_ROLLBACK_SEGMENTS =(comma separated list of Automatic Undo segments)

Example:

_CORRUPTED_ROLLBACK_SEGMENTS = (_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$,
_SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$)

Note, sometimes the alert log will tell you what Automatic Undo segments are in use. 
Search the alert log for SYSS. If the alert log does not contain that information then use _SYSSMU1$ 
through _SYSSMU10$ as shown in the example above.

In UNIX you can issue this command to get the undo segment names:

$ strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u

From the output of the strings command above, add a $ to end of each _SYSSMU undo segment name.

++ Startup mount the database as follows:

SQL > startup mount
SQL > recover database;
SQl > alter database open;

*._corrupted_rollback_segments= (_SYSSMU730$, _SYSSMU731$, _SYSSMU732$, _SYSSMU733$, _SYSSMU734$, 
_SYSSMU735$, _SYSSMU736$, _SYSSMU737$, _SYSSMU738$, _SYSSMU739$, _SYSSMU744$, _SYSSMU740$, _SYSSMU741$, 
_SYSSMU742$, _SYSSMU743$, _SYSSMU744$, _SYSSMU745$, _SYSSMU746$, _SYSSMU747$, _SYSSMU748$, _SYSSMU749$, _SYSSMU74t$, 
_SYSSMU75$, _SYSSMU750$, _SYSSMU751$, _SYSSMU752$, _SYSSMU753$, _SYSSMU754$, _SYSSMU755$, _SYSSMU756$, _SYSSMU757$, 
_SYSSMU758$, _SYSSMU759$, _SYSSMU76$, _SYSSMU760$, _SYSSMU761$, _SYSSMU762$, _SYSSMU763$, _SYSSMU764$, _SYSSMU765$, 
_SYSSMU766$, _SYSSMU767$, _SYSSMU768$)

但在完成以上设置后仍不能避免ora-00600[2667]的发生,下位errpt日志中磁盘阵列损坏信息:

errpt -a

---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4

Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106800
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D544 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32C1 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4

Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106799
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D524 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32C0 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR10
IDENTIFIER: C86ACB7E

Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106798
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916

Description
ARRAY CONFIGURATION CHANGED

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
NO ACTION NECESSARY

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 9502 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0008 1800 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32BF 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4

Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106797
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk4
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L0

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D544 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3215 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4

Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106796
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk4
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L0

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D524 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3214 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR10
IDENTIFIER: C86ACB7E

Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106795
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916

Description
ARRAY CONFIGURATION CHANGED

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
NO ACTION NECESSARY

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 9502 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0008 1800 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3213 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106794
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2800 4644 2F00 0003 0004 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106793
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 132E C948 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106792
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 1335 55F0 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106791
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 136C 0F98 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106790
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 13D7 29B0 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106789
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 1337 FD88 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106788
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 136C 1F08 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106787
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2A00 2578 E838 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106786
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2800 4644 3300 0001 0004 0000 0000 0000 0000 0007 7497 0200 0400 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106785
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE

Failure Causes
DASD MEDIA
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2800 4643 BA00 0004 0004 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR8
IDENTIFIER: 483C9D10

Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106784
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916

Description
ARRAY ACTIVE CONTROLLER SWITCH

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
NO ACTION NECESSARY

Detail Data
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0400 00EE 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0002 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR3
IDENTIFIER: D9770360

Date/Time: Thu Mar 10 18:48:47 THAIST 2011
Sequence Number: 106783
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: dac0utm
Resource Class: NONE
Resource Type: NONE
Location:

Description
ARRAY OPERATION ERROR

Probable Causes
ARRAY DASD DEVICE
STORAGE DEVICE CABLE
EB4F

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS
STORAGE DEVICE CABLE
ARRAY CONTROLLER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 2800 0000 0200 0000 0104 0000 0000 0000 0000 0000 0013 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 1103 8000 F205 3703 0000 0200 0000 0000 0000 0000 0040 0002 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:47 THAIST 2011
Sequence Number: 106782
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi0
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C4-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0500 0000 0000
0001 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0812 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2005 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
1105 C000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR9
IDENTIFIER: 8B79A4BD

Date/Time: Thu Mar 10 18:48:45 THAIST 2011
Sequence Number: 106781
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: dac1
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C1-T1-W200400A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916

Description
ARRAY CONTROLLER SWITCH FAILURE

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0A00 5A00 2C01 0000 0002 0004 0000 0000 0000 0000 0000 0000 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0010 0000 0000 0040 0002 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:45 THAIST 2011
Sequence Number: 106780
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1212 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:43 THAIST 2011
Sequence Number: 106779
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1112 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:42 THAIST 2011
Sequence Number: 106778
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi0
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C4-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0500 0000 0000
0001 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0712 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2005 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
1105 C000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:41 THAIST 2011
Sequence Number: 106777
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1012 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Thu Mar 10 18:48:39 THAIST 2011
Sequence Number: 106776
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0F12 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

因为lgwr的crash是发生在database open后,所以实际上我们是可以在这段时间内操作数据库的,这个case最终通过新建undo tablespace代替老的问题回滚表空间解决了。

We have been repair the disk controller, but ora-600[2667] still occur when we try to open database.
After recreating the undo tablespace we were able to open the database.

Oracle RAC内部错误:ORA-00600[keltnfy-ldmInit]一例

一套SUNOS上的2节点10.2.0.2 RAC系统日前出现ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []内部错误,错误发生时系统操作人员误使用hostname命令修改了1号主机的主机名,之后陆续出现以上ora-00600错误,同时操作系统日志显示RAC CSS进程意外终止,具体日志如下:

================== OS Message=====================
Jan 10 11:15:10 cupd25k-a root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Duplicate Oracle CLSMON found. Killing and restarting it.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CLSMON terminated with unexpected status 137. Respawning

/* 这里的Duplicate Oracle CLSMON found 因该指的是OCLSMON进程,
"In Oracle 10.2.0.2 and above there is an additional process called OCLSOMON
which monitors the CSS daemon for hangs or scheduling issues and can reboot a
node if there is a perceived hang. OCLSOMON is spawned in init.cssd and runs
as the Oracle user."
   oclsmon进程在10.2.0.2以后版本被引入,用以监视css进程,
   若发生hang或操作系统调度问题时该进程可能会reboot节点,
   oclsmon进程会被init.cssd脚本spawned.  */

==================oclsmon.log======================
2011-01-10 11:15:11.376
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.479
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:11.737
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.751
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.006
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.023
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.278
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.293
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1

/*  skgxn是Oracle Clusterware用以监视skgxn事件(即第三方CLUSTERWARE相关的事宜,他们应该有用sun的cluster);
    似乎是修改hostname导致了Oracle CSS出现了fatal error,并启动了一个以上的OCLSMON进程(Duplicate Oracle CLSMON found),
    最后"Oracle CSS daemon failed to start up. Check CRS logs for diagnostics",
    在Oracle instance启动的情况下25k-a节点的CSS进程意外终止,
    可能导致该节点上的所有实例的LMD(global Enqueue Service daemon)、LMON无法正常工作而导致实例hang住。*/

==========================alert.log====================
Errors in file /oracle/oracle/admin/BOCPCS/udump/bocpcs1_ora_12320.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []

=========================part of trace file===============
*** 2011-01-10 11:11:02.957
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFF9D40 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFF8EE8 ?
                                                   FFFFFFFF7FFFA640 ?
                                                   000000008 ?
kgerinv()+200        PTR_CALL 0000000000000000     000000002 ? 10638A1CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A1CC ?
kgeasnmierr()+28     CALL     kgerinv()            106384B98 ? 000000000 ?
                                                   105D3B940 ? 000000002 ?
                                                   FFFFFFFF7FFFDFF0 ?
                                                   000001430 ?
keltnfy()+784        CALL     kgeasnmierr()        106384B98 ? 1064DCBF0 ?
                                                   105D3B940 ? 000000002 ?
                                                   000000000 ? 00000002E ?
kscnfy()+552         PTR_CALL 0000000000000000     10639B498 ? 38001E7A8 ?
                                                   1055AC5D0 ? 10639B498 ?
                                                   000102C00 ? 10638A1C0 ?
ksucrp()+2436        CALL     kscnfy()             000008000 ? 000808214 ?
                                                   100C4C220 ? 1055C6680 ?
                                                   00000000F ? 000000001 ?
opiino()+2056        CALL     ksucrp()             000106387 ? 380007608 ?
                                                   000000000 ? 000380000 ?
                                                   000106000 ? 106387618 ?
opiodr()+1488        PTR_CALL 0000000000000000     10555A000 ?
                                                   FFFFFFFF7FFFF1C8 ?
                                                   00010555A ? 000106000 ?
                                                   105C83000 ? 000000001 ?
opidrv()+828         CALL     opiodr()             106391000 ? 000000000 ?
                                                   106390DD8 ? 106390000 ?
                                                   106391BD0 ? 000106000 ?
sou2o()+80           CALL     opidrv()             106394358 ? 000000001 ?
                                                   00000003C ? 000000000 ?
                                                   00000003C ? 000106000 ?
opimai_real()+124    CALL     sou2o()              FFFFFFFF7FFFF788 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF7B0 ?
                                                   105C82000 ? 000105C82 ?
main()+152           CALL     opimai_real()        000000002 ?
                                                   FFFFFFFF7FFFF888 ?
                                                   103F1BBCC ? 10632DB10 ?
                                                   002411E44 ? 000014400 ?
_start()+380         CALL     main()               000000002 ? 000000008 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF898 ?
                                                   FFFFFFFF7FFFF9A8 ?
                                                   FFFFFFFF7C700200 ?

/* 可以看到以上trace文件指出了no session,
    在服务进程启动阶段遭遇了该keltnfy-ldmInit内部错误*/

metalink文档Startup Database Produces Ora-00600: [Keltnfy-Ldminit] [ID 336447.1]
介绍了该内部错误一般由主机上的不当网络配置引起,很显然使用hostname命令修改了一个无法解析的
主机名时可能引发该ORA-00600[keltnfy-ldmInit]内部错误。

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3 - Release: 10.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 09-Jun-2010***
Symptoms

An startup nomount on Oracle 10g Release 2 database produces the following exception in alert log

Starting up ORACLE RDBMS Version: 10.2.0.1.0.
Errors in file /opt/oracle/10.2/admin/ORCL/udump/ORCL_ora_535.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
USER: terminating instance due to error 600
Instance terminated by USER, pid = 535
Cause

The problem is related to getting host information.
In this case, ldmInit()/sldmInit() is failing with error 46 : LDMERR_HOST_NOT_FOUND

The following exception may also occur :

LDMERR_SOSD_INIT         OSD init failed to be specific in these OSD failures
 LDMERR_BAD_ADDR         bad address when system call gethostname failed
 LDMERR_HOST_NOT_FOUND   gethostbyname system call fails
 LDMERR_NO_SUPPORT       when specific address type is not supported

Development has fixed two bugs so far regarding this issue

Bug:5438154 - Abstract: ORA-600[KELTNFY-LDMINIT]  STARTING THE DB
Release Notes:
ldmInit returned LDMERR_HOST_NOT_FOUND for the machine huge alias list/address list
Workaround:
reduce the alais list of the machine

Bug:5486074 - Abstract: ORA-600 [KELTNFY-LDMINIT] WHEN DNS IS NOT AVAILABLE
Release Notes:
Internal error is raised by the Server Generated Alert subsystem when it can not determine Host Name or
Network Address. This can be caused by DNS server being unaavilable. 

Solution

The fix for 5486074 will not fix any underlying error from gethostbyname(), it just change the internal error to a warning message :

 "Warning: keltnfy call to ldmInit failed with error 46"

You will still need to fix the network config issue.  

These are the check you can do verify the host information 

      Check permission on /etc/hosts 

$ ls -l /etc/hosts
-rw-r--r--  2 root root 194 Oct 17  2006 /etc/hosts

      Check if /etc/hosts file is correctly configured

              ( all of this on one line ). 

Check the hostname:
$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname
      Check if /etc/nodename is correctly configured

If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.

$ nslookup
$ nslookup
$ nslookup 

The forward and reverse lookup should succeed and return consistent address/info.  

 Check nsswitch.conf

$ more nsswitch.conf
hosts:      files dns
Make sure host lookup is also done through the /etc/hosts file and not just dns.  It is recommended that FILES come first before DNS.
Also, check the resolv.conf. This makes sure that the DNS is working properly.

显然在生产主机上使用hostname命令是危险的,因为你很难保证你在打字的时候不会因为同事的一下拍击而输错,有人说在生产环境中rm命令因该被禁用,那么这种特殊待遇对hostname命令也适用,我们可以用什么来代替hostname查看主机名呢?选择可以有非常多,这里我推荐一种:

-bash-3.00$ oslevel -r 
5300-07

-bash-3.00$ hostname
askmac.cn

-bash-3.00$ uname -n
askmac.cn

/* uname -n完全可以满足你的需要! */
That's great!

Oracle内部错误:ORA-00600[OSDEP_INTERNAL]一例

一套HP-UX上的9.2.0.5系统在shutdown abort时出现ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []内部错误,伴随有ORA-27302: failure occurred at: skgpwinit4,ORA-27303: additional information: attach to invalid skgp shared ctx,具体日志如下:

/opt/oracle/product/9.2.0.5/rdbms/log/ngende_ora_7669.trc
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name: HP-UX
Node name: yictngd3
Release: B.11.23
Version: U
Machine: ia64
Instance name: nGende
Redo thread mounted by this instance: 0 
Oracle process number: 0
7669

*** 2010-09-08 00:10:02.985
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Current SQL information unavailable - no session.

Call stack
--------------
ksedmp <- ksfdmp <- kgerinv <- kgerin <- kgerecoserr <- ksucrp <- ksucresg <- kpolna 
<- kpogsk <- opiodr <- ttcpip <- opitsk <- Cannot <- Cannot <- Cannot <- Cannot <- opiino 
<- opiodr <- opidrv <- sou2o <- main <- main_opd_entry

经查该内部错误与操作系统共享内存有关,相关的Note有:

Ora-00600: Internal Error Code, Arguments: [Osdep_internal] [ID 304027.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.2 to 10.2.0.3 - Release: 9.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 03-NOV-2010***

Getting ORA-600 [OSDEP_INTERNAL] errors while starting up the database:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL],
[], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwreset1
ORA-27303: additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
Symptoms
Getting ORA-600 [OSDEP_INTERNAL]
Accompanied by the following errors
ORA-27302:Failure occured at: skgpwreset1
ORA-27303:additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation: segment failed with error 28
ORA-27301: OS system Failure message: No space left on device
ORA-27302: failure occured at: sskgpsemsper

Cause
The functions in the trace file generated point to the semaphore settings .
Smmns is set too low.

Solution
set semmns 32767
Arrange to make the changes persistent as per the Operating system then restart the server and check if the changes are persistent.
eg: Linux /etc/sysctl.conf

sem = semmsl semmns semopm semmni
kernel.sem = 256 32768 100 228

Getting ORA-00600 [OSDEP_INTERNAL]: Internal Error While Trying To Connect / As Sysdba [ID 253885.1]

Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.3 and later   [Release: 9.2 and later ]
HP-UX PA-RISC (64-bit)
Symptoms
Getting following error while trying to connect as sysdba using sqlplus:

SQL> conn / as sysdba
ERROR:
ORA-01041: internal error. hostdef extension doesn't exist

Alert.log shows:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [],[], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Cause
- Database was shutdown using "shutdown abort" option.
- Shared memory segment was not removed even though the instance was down.
Solution
+ Check which shared memory segments are owned by the oracle owner

Use the ipcs -bm command:

% ipcs -bm

m 34034336 0xf8f18468 --rw-r----- ORACLE dba 16777216

+ Delete the 'orphan' shared memory segments:

% ipcrm -m 34034336

If there is more than one instance running on the server and you are not sure how to identify the shared
memory segments then please contact support.

不恰当的设置OS VM参数可能导致该问题,而在HP-UX PA-RISC平台上使用'shotdown abort'命令时可能因为共享内存未能正常移除而出现该内部错误;因为实例还是以'abort'方式关闭的,仅仅是共享内存未能释放,所以只需要以ipcs->ipcrm等os命令将相应的共享内存段释放就可以了,不会造成其他影响。

Oracle内部错误:ORA-00600[15801], [1]一例

一套Sparc Solaris上的11.1.0.7系统,在创建索引时频繁出现ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []内部错误,日志信息如下:

Tue Aug 17 17:34:21 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p023_22262.trc (incident=12505):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12505/ORAHCMU_p023_22262_i12505.trc
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p021_22258.trc (incident=12489):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12489/ORAHCMU_p021_22258_i12489.trc

Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p015_9328.trc (incident=19909):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p043_9388.trc (incident=20133):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p087_9668.trc (incident=20485):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc (incident=19885):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_19789/ORAHCMU_ora_8602_i19789.trc
Mon Aug 23 14:43:43 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages

Dump continued from file: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_ora_8602.trc
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []

*** 2010-08-23 14:43:42.974
----- Current SQL Statement for this session (sql_id=00abhfx460qm9) -----
CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG (CAL_ID, GP_PAYGROUP, 
EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) TABLESPACE PSINDEX STORAGE 
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PARALLEL NOLOGGING

----- Call Stack Trace -----
ksedst1 ksedst dbkedDefDump dbgexPhaseII dbgexProcessError dbgePostErrorKGE kgeade kgerem
kxfpProcessError kxfpqidqr kxfpqdqr kxfxgs kxfxcp qerpxSendParse kxfpValidateSlaveGroup kxfpgsg
kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart kdicrws kdicdrv opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o main



SO: 0x3bf0bbf20, type: 4, owner: 0x3bf5452d0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3bf5452d0, name=session, file=ksu.h LINE:10719 ID:, pg=0
(session) sid: 217 ser: 767 trans: 0x3bc0660f8, creator: 0x3bf5452d0
flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x44008) DDLT1/-
DID: , short-term DID:
txn branch: 0x0
oct: 9, prv: 0, sql: 0x3b5d14510, psql: 0x3b6d59820, user: 31/SYSADM
ksuxds FALSE at location: 0
service name: ORAHCMU
client details:
O/S info: user: Administrator, term: UJWALTPVM, ospid: 304:2892
machine: WORKGROUP\UJWALTPVM program: pside.exe
client info: ujwal,Administrator,UJWALTPVM,,pside.exe,
application name: pside.exe, hash value=2824484291
Current Wait Stack:
Not in wait; last wait ended 2.475286 sec ago
Wait State:
auto_close=0 flags=0x21 boundary=0x0/-1
Session Wait History:
0: waited for 'lient'
=c8, =1, =0
wait_id=10483 seq_num=10484 snap_id=1
wait times: snap=0.168502 sec, exc=0.168502 sec, total=0.168502 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000903 sec of elapsed time
1: waited for ' waiting for ruleset'
=10010063, =1, =0
wait_id=10482 seq_num=10483 snap_id=1
wait times: snap=0.008580 sec, exc=0.008580 sec, total=0.008580 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000731 sec of elapsed time
2: waited for ' waiting for ruleset'
=1001004f, =4, =0
wait_id=10481 seq_num=10482 snap_id=1
wait times: snap=0.000132 sec, exc=0.000132 sec, total=0.000132 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000074 sec of elapsed time
3: waited for ' waiting for ruleset'
=1001004f, =3, =0
wait_id=10480 seq_num=10481 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000065 sec of elapsed time

----- Session Cursor Dump -----
Current cursor: 1, pgadep=0

Open cursors(pls, sys, hwm, max): 3(0, 2, 64, 300)
NULL=1 SYNTAX=0 PARSE=0 BOUND=1 FETCH=0 ROW=1
Cached frame pages(total, free):
4k(14, 14), 8k(1, 1), 16k(1, 1), 32k(0, 0)

----- Current Cursor -----


----- Plan Table -----

============
Plan Table
============
----------------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
----------------------------------------------------+-----------------------------------+-------------------------+
| 0 | CREATE INDEX STATEMENT | | | | 2 | | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 82 | 4510 | | |:Q1001| P->S |QC (ORDER) |
| 3 | INDEX BUILD UNIQUE | PS_HM_BEN_GP_STG| | | | |:Q1001| PCWP | |
| 4 | SORT CREATE INDEX | | 82 | 4510 | | |:Q1001| PCWP | |
| 5 | PX RECEIVE | | 82 | 4510 | 2 | 00:00:01 |:Q1001| PCWP | |
| 6 | PX SEND RANGE | :TQ10000 | 82 | 4510 | 2 | 00:00:01 |:Q1000| P->P |RANGE |
| 7 | PX BLOCK ITERATOR | | 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWC | |
| 8 | TABLE ACCESS FULL | PS_HM_BEN_GP_STG| 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWP | |
----------------------------------------------------+-----------------------------------+-------------------------+


----------------------------------------
Cursor#1(0xffffffff7ce31928) state=BOUND curiob=0xffffffff7ce57d28
curflg=4c fl2=0 par=0x0 ses=0x3bf0bbf20
----- Dump Cursor sql_id=00abhfx460qm9 xsc=0xffffffff7ce57d28 cur=0xffffffff7ce31928 -----
Dump Parent Cursor sql_id=00abhfx460qm9 phd=0x3b5d14510 plk=0x3b0bb3318
sqltxt(0x3b5d14510)=CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG 
(CAL_ID, GP_PAYGROUP, EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) 
TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) 
PCTFREE 10 PARALLEL NOLOGGING
hash=616eaa631fc21f4c0029707748605a69
parent=0x3ae539590 maxchild=01 plk=0x3b0bb3318 ppn=n
cursor instantiation=0xffffffff7ce57d28 used=1282545779 exec_id=16777216 exec=1
child#0(0x3b5d05e10) pcs=0x3b678c128
clk=0x3b7e200d0 ci=0x3b5b204c8 pn=0x39955d2b8 ctx=0x3b86ee988
kgsccflg=0 llk[0xffffffff7ce57d30,0xffffffff7ce57d30] idx=0
xscflg=c0102276 fl2=c000400 fl3=2202008 fl4=100
Frames pfr 0xffffffff7ce67098 siz=85976 efr 0xffffffff7ce66fb8 siz=85960
Cursor frame dump
enxt: 7.0x00000168 enxt: 6.0x00008000 enxt: 5.0x00008000 enxt: 4.0x00003978
enxt: 3.0x00000490 enxt: 2.0x000000b8 enxt: 1.0x00000fa0
pnxt: 1.0x00000010
kxscphp=0xffffffff7dd80a18 siz=984 inu=312 nps=312
kxscwhp=0xffffffff7ddd2cc8 siz=8136 inu=6264 nps=3968
kxscefhp=0xffffffff7ce51468 siz=88456 inu=86128 nps=86128


FileName
----------------
ORAHCMU_ora_8602.trc

FileComment
----------------------


Oracle Support - August 27, 2010 6:13:39 PM GMT+08:00 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===

=== ODM Data Collection ===

Trace file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc


*** 2010-08-23 14:43:00.472
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
startup image information
iid info sz=245752512 inode=65458 ts=0x4c6df668
current process image information
iid info sz=245750720 inode=65427 ts=0x4c7204b0
set _disable_image_check = TRUE to disable this check
qksceLinearToCe error

*** 2010-08-23 14:43:42.974
*** SESSION ID:(220.111) 2010-08-23 14:43:42.974
*** CLIENT ID:(ujwal) 2010-08-23 14:43:42.974
*** SERVICE NAME:(ORAHCMU) 2010-08-23 14:43:42.974

DDE: Problem Key 'ORA 600 [15801]' was flood controlled (0x6) (incident: 19885)
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
kxfxdss
KXFXSLAVESTATE dump [0, 0]
(pgakid: 0 oercnt: 0 oerrcd: -2224892588)
kxfxdss
no current cursor context.
kxfxdss
no cursors.

关于binary no match的问题已知是由于在实例启动情况下relink导致的;这个case提交了SR,metalink认为ORA-600 15801一般由QC与服务子进程通信问题引起:

The ORA-600 15801 is reporting a communication problem between QC and slaves related with messages sent/received.
Alert log reports several of the following error on the ASM instance:
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]

last wait was for 'eq: Msg Fragment' 

DIAGNOSTIC ANALYSIS:
--------------------
There were also several of the following message in the alert log:
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these 
messages

So, I asked the customer to set the "_disable_image_check" = true 
This had no impact on the ora-600 errors as expected.

ORA-600 [15801] is signalled when a message overflow occurs between  PQ 
processes.

WORKAROUND:
-----------
none 
RELATED BUGS:
-------------
none
REPRODUCIBILITY:
----------------
intermittent but frequently - occurs at all different times of the day.
STACK TRACE:
------------
*** ID:(29.2904) 2006-07-05 15:50:57.972
qksceLinearToCe error
*** 15:50:58.233
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]
----- Call Stack Trace -----

kxfxGeter qks3tttdefReceive kxfxsui kxfxsp kxfxmai kxfprdp 

    SO: 0x67977018, type: 4, owner: 0x6793f208, flag: INIT/-/-/0x00
    (session) sid: 29 trans: (nil), creator: 0x6793f208, flag: (c0000041) 
USR/- BSY/-/-/-/-/-
              DID: 0000-0012-0000FADB, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 3, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    O/S info: user: oracle, term: , ospid: 4558, machine: 
    last wait for 'eq: Msg Fragment' blocking sess=0x(nil) seq=2 
wait_time=4441 seconds since wait started=3
                ct path write=1002ffff, ct path write temp=2, Network=0
    Dumping Session Wait History
     for 'eq: Msg Fragment' count=1 wait_time=4441
                ct path write=1002ffff, ct path write temp=2, Network=0
     for 'eq: Msg Fragment' count=1 wait_time=31
                ct path write=1002ffff, ct path write temp=1, Network=0
    temporary object counter: 0

最后这个case通过设置10235和10501事件后错误不再产生了:

event = "10235 trace name context forever, level 2"  

10235, 00000, "check memory manager internal structures" 

event = "10501 trace name context forever, level 1"
  
10501, 00000, "periodically check selected heap"
// *Cause:
// *Action:
//    Level:  0x01 PGA
//            0x02 SGA
//            0x04 UGA
//            0x08 current call
//            0x10 user call
//            0x20 large allocation pool

Oracle内部错误:ORA-00600:[4097]一例

一套Linux上的10.2.0.4系统在异常恢复后(使用_allow_resetlogs_corruption隐藏参数打开后遭遇ORA-00600:[40xx]相关的内部错误,创建并切换到了新的撤销表空间上)出现ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []内部错误,当该非内部错误(non-fatal)出现100次以上时会在告警日志alert.log中出现记录。
并有可能导致实例crash,具体日志如下:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队

 

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

Errors in file /s01/10gdb/admin/clinica/bdump/clinica_smon_21463.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jan  4 23:13:19 2011
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.

clinica_smon_21463.trc:
Dump of buffer cache at level 4 for tsn=1, rdba=8388633
BH (0x91fdf428) file#: 2 rdba: 0x00800019 (2/25) class: 19 ba: 0x91c62000
  set: 3 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [fcf7dd68,fcf7dd68] lru: [91fdf5b8,91fdf398]
  ckptq: [NULL] fileq: [NULL] objq: [f5b53d60,f5b53d60]
  use: [fa694970,fa694970] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800019 (2/25)
  scn: 0x0000.0352d07c seq: 0x01 flg: 0x00 tail: 0xd07c2601
  frmt: 0x02 chkval: 0x0000 type: 0x26=KTU SMU HEADER BLOCK

/* 这里dump了一个tsn=1,file#=2的数据块,
    可以看到它的类型是KTU SMU HEADER BLOCK即某个回滚段头
*/

Hex dump of block: st=0, typ_found=1
........................
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map) 
values (0, :1, :2, :3, :4, :5, :6, :7)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgeriv()+176         call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgesiv()+119         call     kgeriv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000000000 ? 0F4A33EA0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksesic0()+209        call     kgesiv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000001001 ? 000000000 ?
                                                   7FFFF53BCEE0 ? 000000000 ?
ktugti()+3200        call     ksesic0()            000001001 ? 0068C9940 ?
                                                   000000000 ? 00000009A ?
                                                   000000010 ? 101010101010101 ?
ktsftcmove()+4149    call     ktugti()             0B73F111C ? 7FFFF53BD278 ?
                                                   7FFFF53BD280 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
ktsf_gsp()+1937      call     ktsftcmove()         00000000A ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
kdtgsp()+512         call     ktsf_gsp()           000000000 ? 7FFFF53BF460 ?
                                                   000000024 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdccak()+111         call     kdtgsp()             2ABDF1D6A2D8 ? 7FFF00000000 ?
                                                   2ABDF1D68530 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdcgcs()+5419        call     kdccak()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdcgsp()+1372        call     kdcgcs()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdtInsRow()+1808     call     kdcgsp()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insrow()+342         call     kdtInsRow()          2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insdrv()+594         call     insrow()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 0F4A33DE0 ?
                                                   2ABDF1D6A370 ? 000000000 ?
inscovexe()+404      call     insdrv()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 2ABDF1D6D908 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insExecStmtExecIniE  call     inscovexe()          0F4A33DE0 ? 0F4A3C230 ?
ngine()+85                                         7FFFF53C0EF0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insexe()+386         call     insExecStmtExecIniE  0F4A33DE0 ? 0F4A3C230 ?
                              ngine()              2ABDF1D69F20 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
opiexe()+9182        call     insexe()             0F4A333A8 ? 7FFFF53C0EF0 ?
                                                   0F4A33DE0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 2ABDF1D69F20 ?
opiall0()+1842       call     opiexe()             000000049 ? 000000003 ?
                                                   7FFFF53C12F8 ? 000000001 ?
..............

针对该ORA-00600:[4097]内部错误,metalink上Note [ID 1030620.6]介绍了一种workaround的方法:

An ORA-600 [4097] can be encountered through various activities that use 
rollback segments.

Solution Description: 
===================== 

The most likely cause of this is BUG 427389.  This BUG is fixed in
version 7.3.3.3.  The BUG is caused when Rollback Segments are dropped and 
recreated after a shutdown abort.  It is encountered through a very specific 
set of circumstances: 

When an instance has a rollback segment offline and the instance crashes, or 
the user does a shutdown abort, the rollback segment wrap number does not get 
updated.  If that segment is then dropped and recreated immediately after the 
instance is restarted, the wrap number could be lower than existing wrap 
numbers.  This will cause the ORA-600[4097] to occur in subsequent 
transactions using Rollback. 

To avoid encountering this bug, rollback segments should only be dropped and 
recreated after the instance has been shutdown normal and restarted.  If you 
have already encountered the bug, use the following workaround:  

   Select segment_name, segment_id from dba_rollback_segs; 

   Drop all Rollback Segments except for SYSTEM.  

   Recreate dummy (small) rollback segments with the same names in their place. 

   Then, recreate additional rollback segments you want to keep with their 
   permanent storage parameters.   

   Now drop the dummy ones. This should ensure that the segment_ids are not 
   reused. 

If you ever want to add a rollback segment you have to use the workaround steps
again.  If you do not fill the dummy slots you may see the problem re-appear.

我们可以尝试drop异常恢复前已有的可能存在问题的rollback segment来规避这个问题,虽然在10g下使用AMU(automatic managed undo)但仍可以做到这一点:

SQL> alter system set "_smu_debug_mode"=4;
System altered.

/* 设置SMU debug模式为4以便能够手动管理回滚段 */

SQL> set heading off 

SQL> select 'drop rollback segment "'||segment_name||'";' from dba_rollback_segs where segment_name!='SYSTEM';

drop rollback segment "_SYSSMU1$";
drop rollback segment "_SYSSMU2$";
drop rollback segment "_SYSSMU3$";
drop rollback segment "_SYSSMU4$";
drop rollback segment "_SYSSMU5$";
drop rollback segment "_SYSSMU6$";
drop rollback segment "_SYSSMU7$";
drop rollback segment "_SYSSMU8$";
drop rollback segment "_SYSSMU9$";
drop rollback segment "_SYSSMU10$";
drop rollback segment "_SYSSMU11$";
drop rollback segment "_SYSSMU12$";
drop rollback segment "_SYSSMU13$";
drop rollback segment "_SYSSMU14$";
drop rollback segment "_SYSSMU15$";
drop rollback segment "_SYSSMU16$";
drop rollback segment "_SYSSMU17$";
drop rollback segment "_SYSSMU18$";
drop rollback segment "_SYSSMU19$";
drop rollback segment "_SYSSMU20$";
drop rollback segment "_SYSSMU21$";
drop rollback segment "_SYSSMU22$";
drop rollback segment "_SYSSMU23$";
drop rollback segment "_SYSSMU24$";
drop rollback segment "_SYSSMU25$";
drop rollback segment "_SYSSMU26$";
drop rollback segment "_SYSSMU27$";
drop rollback segment "_SYSSMU28$";
drop rollback segment "_SYSSMU29$";
drop rollback segment "_SYSSMU30$";

30 rows selected.

/* 依次执行以上的drop rollback segment回滚段的命令
    注意当前撤销表空间上的回滚段仅能offline而无法drop掉,
    实际上我们需要做的也仅仅是把之前undo表空间上有问题的回滚段drop掉
*/

SQL> alter rollback segment "_SYSSMU30$" offline;
Rollback segment altered.

SQL> drop rollback segment "_SYSSMU30$";
drop rollback segment "_SYSSMU30$"
*
ERROR at line 1:
ORA-30025: DROP segment '_SYSSMU30$' (in undo tablespace) not allowed

SQL> alter rollback segment "_SYSSMU30$" online;
Rollback segment altered.

经过以上drop问题回滚段rollback segment后,系统不再出现ORA-00600:[4097]内部错误,实例恢复正常。在系统正常后,我们有必要重置之前所设的”_smu_debug_mode”UNDO管理debug模式的隐藏参数。

Oracle内部错误:ORA-00600:[6033]一例

一套HP-UX上的9.2.0.8系统,某条查询语句执行时出现ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []内部错误,错误trace信息如下:

*** SESSION ID:(583.18281) 2010-12-20 22:49:01.364
*** 2010-12-20 22:49:01.364
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT INTERFACE_HEADER_ID, DOCUMENT_SUBTYPE, AGENT_ID, VENDOR_SITE_ID FROM PO_HEADERS_INTERFACE WHE
RE WF_GROUP_ID = :B1 ORDER BY INTERFACE_HEADER_ID
----- PL/SQL Call Stack -----
object line object
handle number name
c0000001067e3328 4332 package body APPS.PO_AUTOCREATE_DOC
c0000000fd267060 1 anonymous block
c000000108fe4d60 1979 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1745 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body
PL/SQL call stack truncated after 1024 bytes.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+184 ? ksedst() C0000000CEB36420 ?
400000000147994B ?

已知的ORA-00600:[6033]错误一般和索引逻辑讹误相关,metalink上有相关的Note建议在出现该错误后运行analyze table validate structure cascade语句以验证表与索引间的数据正确性。

ORA-600 [6033] "null value retrieved from index leaf lookup" [ID 45795.1]
Modified 03-JUN-2010 Type REFERENCE Status PUBLISHED
Note: For additional ORA-600 related information please read Note:146580.1

PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
Run the ANALYZE command on any tables and indexes in the
trace file:
Example: ANALYZE TABLE
 VALIDATE STRUCTURE CASCADE;
Rebuild any corrupted indexes.
Index corruption.
Known Bugs

NB Bug Fixed Description
6401576 9.2.0.8.P22 OERI[ktbair1] / ORA-600 [6101] index corruption possible
5845232 9.2.0.8.P06 Block corruption / errors from concurrent dequeue operations
2718937 9.2.0.4, 10.1.0.2 OERI:6033 from SELECT on IOT with COMPRESSED PRIMARY KEY
1573283 8.1.7.2, 9.0.1.0 OERI:6033 from ALTER INDEX .. REBUILD ONLINE PARAMETERS ('OPTIMIZE FULL')
Certain index operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1], ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Concurrent dequeue operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Note: This issue was previously fixed under bug 5559640 but that fix had a serious problem which could lead to SGA memory corruption. This fix supercedes the fix for bug 5559640. The problem with patch 5559640 is alerted in Note:414109.1 This fix is superceeded by the fix for bug 6401576.

通过analyze table validate structure cascade命令验证索引后若存在问题则会进一步产生相关的trace文件,一般这类索引逻辑讹误的问题可以通过drop-recreate索引来解决。

沪ICP备14014813号-2

沪公网安备 31010802001379号