【Oracle数据恢复】ORA-00600[6711]错误一例

一套Linux上的10.2.0.4系统,日志中频繁出现ORA-00600[6711]内部错误:

 

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

 

Wed Sep  1 21:24:30 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_5622.trc:
ORA-00600: internal error code, arguments: [6711], [4256248], [1], [4256242], [0], [], [], []
Wed Sep  1 21:24:31 2010
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.

 

 

MOS上有一个关于6711内部错误十分简单的Note,该文档声称出现6711错误极有可能是部分类型为簇(cluster)的数据字典表存在潜在的讹误,这个Note甚至没有告诉我们该错误argument参数的意义。
不过其实我们可以猜出来,因为是和corruption相关的错误,那么实际上可能关联的几个因素无非是obj#,file#,block#;4256248和4256242 两个数字像极了Data Block Address,把他们当做dba来看待,也就指向了1号数据文件的61938块和61944数据块,我们来看看这些块属于哪个对象:

SQL> set linesize 200;
SQL> select segment_name, segment_type
  2    from dba_extents
  3   where relative_fno = 1
  4     and (61938 between block_id and block_id + blocks or
  5         61944 between block_id and block_id + blocks);

SEGMENT_NAME                                                                      SEGMENT_TYPE
--------------------------------------------------------------------------------- ------------------
SMON_SCN_TO_TIME                                                                  CLUSTER

不出意料是一个cluster,SMON_SCN_TO_TIME是SMON_SCN_TIME表的基簇,SMON_SCN_TIME表用以记录数据库中scn对应的时间戳。我们直接查看用以创建数据字典的sql.bsq文件,可以进一步了解他们的结构:

cat $ORACLE_HOME/rdbms/admin/sql.bsq|grep -A 24 "create cluster smon_scn_to_time"
create cluster smon_scn_to_time (
  thread number                         /* thread, compatibility */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
  thread number,                         /* thread, compatibility */
  time_mp number,                        /* time this recent scn represents */
  time_dp date,                          /* time as date, compatibility */
  scn_wrp number,                        /* scn.wrp, compatibility */
  scn_bas number,                        /* scn.bas, compatibility */
  num_mappings number,
  tim_scn_map raw(1200),
  scn number default 0,                  /* scn */
  orig_thread number default 0           /* for downgrade */
) cluster smon_scn_to_time (thread)
/

create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp)
/

create unique index smon_scn_time_scn_idx on smon_scn_time(scn)
/

从以上脚本可以看到这个簇上存在多个索引,我们需要进一步validate验证所有这些对象:

SQL> analyze table SMON_SCN_TIME validate structure;
Table analyzed.

SQL>analyze table SMON_SCN_TIME validate structure cascade;
Table analyzed.

SQL> analyze cluster SMON_SCN_TO_TIME validate structure;
Cluster analyzed.

SQL> analyze cluster SMON_SCN_TO_TIME validate structure cascade;
analyze cluster SMON_SCN_TO_TIME validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

到这里问题已经很清晰了,问题出在SMON_SCN_TO_TIME的索引smon_scn_to_time_idx身上,极有可能是该索引上出现了逻辑讹误。所幸有问题的仅仅是索引,找出问题所在后要解决就显得容易得多了:

SQL> alter index smon_scn_to_time_idx rebuild ;

Index altered.

/* 在索引出现讹误的情况下仅仅rebuild往往是无效的,在我们rebuild的同时告警日志中再次出现了ORA-00600[6711]错误 !!! */

/* 我们需要的彻底把有问题的索引drop掉,并再次创建!!! */

SQL> drop index smon_scn_to_time_idx ;

Index dropped.

SQL> create index smon_scn_to_time_idx on cluster smon_scn_to_time;

Index created.

/* 至此问题解决,告警日志中不再出现错误! * /

/* That's great! * /

【Oracle数据恢复】ORA-600[4194]错误一例

ORA-600[4194]内部错误一般由重做记录与回滚记录不匹配引发。Oracle在验证Undo record number时,会对比redo change 和回滚段中的undo record number,若发现2者存在差异则报该4194错误。其错误argument[a][b],a代表回滚块中的最大undo record number,b代表重做日志中记录的undo record number。这个错误可能由回滚段或者redo log日志文件讹误引起。

ORA-00600[4194]错误的根本原因是 redo记录与回滚段(rollback/undo)记录之间的不一致。当ORACLE在验证undo记录时相对应的变化需要应用到undo数据块的最大undo记录上,此时若检验出错则会报ORA-00600[4194]

 

 

 

此错误不像ORA-600[2662]或ORA-600[4000]错误那样必然导致数据库无法打开,因为它很少出现在前滚阶段;当数据库被打开,smon开始执行事务恢复或一些回滚段的管理工作时则很有可能触发该错误。

 

ORA-600[4194]的2个的含义:

Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

 

这个ORA-600[4194] 报错属于ORACLE内核从cache层到事务undo处理,可能的影响是进程失败或者可能的回滚段坏块。

 

可能的bug 包括:

 

8240762  10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

 

3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC

792610 8.0.6.0, 8.1.6.0 Rollback segment corruption

 

对于非自举对象non-bootstrap对象对应的undo记录可以通过如下方法搞定,如果涉及到的对象是bootstrap系统对象则可能需要手动 bbed来修复, 如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复

 

 

 

来具体看一下错误记录:

 

 

 

Thu Aug 26 18:58:50 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_6587.trc:
ORA-01595: error freeing extent (3) of rollback segment (4))
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 18:58:50 2010
..............
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
Thu Aug 26 19:00:31 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:34 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:35 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'

ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []

 

 

 

如果你因为ORA-600[4194]错误导致数据库无法打开,那么可以尝试设置以下事件:

 

 

SQL> alter system set event='10513 trace name context forever,level 2 : 10512 trace name context forever,level 1: 10511 trace name context forever,level 2: 10510 trace name context forever,level 1' scope=spfile;
System altered.

/* 10513事件用以阻止SMON在启动数据库后执行事务恢复(transaction recovery) */
/* 10512事件用以阻止SMON shrink rollback segment */
/* 10511事件用以阻止SMON check to cleanup undo dictionary */
/* 10500事件用以阻止SMON check to offline pending offline rollback segment */

SQL> alter system set undo_management=MANUAL scope=spfile;
System altered.

SQL> shutdown immediate;
ORA-03113: end-of-file on communication channel

SQL> startup mount;
ORACLE instance started.

Total System Global Area 2634022912 bytes
Fixed Size                  2086288 bytes
Variable Size            2382367344 bytes
Database Buffers          234881024 bytes
Redo Buffers               14688256 bytes
Database mounted.
SQL> alter database open;

Database altered.

SQL>  create undo tablespace undoc datafile size 300M;

SQL> alter system set undo_management=AUTO scope=spfile;
System altered.

SQL>  alter system set undo_tablespace=undoc scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2634022912 bytes
Fixed Size                  2086288 bytes
Variable Size            2382367344 bytes
Database Buffers          234881024 bytes
Redo Buffers               14688256 bytes
Database mounted.

SQL> alter database open;
Database altered.

/* 通过重建undo表空间可以避免一些4194错误,但不是全部 */

/* 这个库目前处于随时会crash的不可控状态,我们必须要导出数据并导入到新库中 * /

/* 这种情况下direct方式 可能可以规避一些意外错误 */

[maclean@rh2 dump]$ exp maclean/maclean file=full_maclean.dmp owner=maclean  direct=y statistics=none
Export: Release 10.2.0.4.0 - Production on Thu Aug 26 21:18:40 2010
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and UTF8 NCHAR character set

About to export specified users ...
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user MACLEAN
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user MACLEAN
About to export MACLEAN's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export MACLEAN's tables via Direct Path ...
Table SYS_EXPORT_TABLE_01 will be exported in conventional path.
. . exporting table            SYS_EXPORT_TABLE_01        256 rows exported
Table SYS_EXPORT_TABLE_02 will be exported in conventional path.
. . exporting table            SYS_EXPORT_TABLE_02        257 rows exported
Table SYS_EXPORT_TABLE_03 will be exported in conventional path.
..............
exporting refresh groups and children
. exporting dimensions
. exporting post-schema procedural objects and actions
. exporting statistics
Export terminated successfully with warnings.

/* we are lucky! */

沪ICP备14014813号-2

沪公网安备 31010802001379号