dbms_hm.run_check遇到ORA-00604、ORA-01427

11.2.0.3 下尝试使用11g health monitor新特性时出现了ORA-00604、ORA-01427, 查询MOS发现 (Bug 12385172: ORA-01427 WHEN EXECUTING DBMS_HM.RUN_CHECK),当 DB中存在case when then的function index时会触发该BUG:

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-2');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-2'); END;

*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1

 

 

可以通过以下脚本找出 DB中case when then类型的函数索引:

 

 

-- Determine DDL statements (note: this will take a while to return results!)

 set long 100000

 exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',true);
 exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',false);
 exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);
 exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',false);

 -- Checking the DDL statement
 col DDL form a100 word_wrapped
 select dbms_metadata.get_ddl(RTRIM(UPPER(object_type)),
                              RTRIM(UPPER(object_name)),
                              RTRIM(UPPER(owner))) DDL
  from DBA_OBJECTS
 where object_type='INDEX'
   and object_id
    in (select x from (select obj# x, obj#||','||intcol#,  count(obj#||','||intcol#)
          from ICOLDEP$
         group by obj#, obj#||','||intcol# having count(*) > 1)
 );

 

 

对于安装了APEX 组件或者在DBCA创建数据库时选择了General Purpose从Seed中clone数据库而非Custom Database的DB ,都会创建有”APEX_030200″.”WWV_FLOW_WORKSHEETS_UNQ_IDX”、”APEX_030200″.”WWV_FLOW_WS_UNQ_ALIAS_IDX”、”APEX_030200″.”WWV_FLOW_WORKSHEET_RPTS_UK” 三个函数索引。

如果没有实际使用APEX组件的话,我们可以直接DROP掉APEX_030200:

 

SQL> drop user "APEX_030200" cascade;

User dropped.

SQL> set long 100000
SQL>
SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',true);

PL/SQL procedure successfully completed.

SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',false);

PL/SQL procedure successfully completed.

SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);

PL/SQL procedure successfully completed.

SQL> exec dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',false);

PL/SQL procedure successfully completed.

SQL>
SQL> -- Checking the DDL statement
SQL> col DDL form a100 word_wrapped
SQL> select dbms_metadata.get_ddl(RTRIM(UPPER(object_type)),
  2                               RTRIM(UPPER(object_name)),
  3                               RTRIM(UPPER(owner))) DDL
  4   from DBA_OBJECTS
  5  where object_type='INDEX'
  6    and object_id
  7     in (select x from (select obj# x, obj#||','||intcol#,  count(obj#||','||intcol#)
  8           from ICOLDEP$
  9          group by obj#, obj#||','||intcol# having count(*) > 1)
 10  );

no rows selected

 

 

再次尝试测试health check dictionary 发现问题仍存在:

 

SQL>  exec dbms_hm.run_check('Dictionary Integrity Check','check-mac3');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-mac3'); END;

*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1

 

 

到这一步决定自己来诊断这个ORA-01427错误的根源, 因为是递归SQL层出现故障,所以这里我们可以用到ERRORSTACK来深入了解问题:

 

 

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 1427 trace name errorstack level 4;
Statement processed.

/* 以上我们设置当触发1427错误事件时TRACE level 4的错误堆栈ERRORSTACK */

SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-mac4');
BEGIN dbms_hm.run_check('Dictionary Integrity Check','check-mac4'); END;

*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01427: single-row subquery returns more than one row
ORA-06512: at "SYS.DBMS_HM", line 191
ORA-06512: at line 1

/* 触发ORA-01427 错误 将生成相关TRACE 信息*/

SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_7781.trc

 

来进一步观察生成的TRACE文件:

 

*** 2012-04-30 09:20:55.438
dbms_hm: (In run_check)
Begin dbkhicd_run_check
dbkh_run_check_internal: BEGIN; check_namep=Dictionary Integrity Check, run_namep=check-mac4
dbkh_run_check_internal: BEGIN; timeout=0
dbkh_run_check_internal: AFTER RUN CREATE; run_id=1281

*** 2012-04-30 09:20:55.603
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=4, mask=0x0)
----- Error Stack Dump -----
ORA-01427: single-row subquery returns more than one row
----- Current SQL Statement for this session (sql_id=gxjzd1s7m8xfj) -----
select 52, rowid, 'ind$.obj#'
  from IND$
 where obj# < 0
union all
select 57, rowid, 'ind$.type#'
  from IND$
 where type# not between 1 and 9
union all
select 58, rowid, 'ind$.pctfree$'
  from IND$
 where pctfree$ not between 0 and 99
union all
select 59, rowid, 'ind$.analyzetime <= SYSDATE'
  from IND$
 where analyzetime > SYSDATE
union all
select 51, rowid, 'ind$.obj# pk'
  from IND$
 where obj# is null
union all
select 51, rowid, 'ind$.obj# pk'
  from IND$
 where 1 > (select obj# from IND$ group by obj# having count(*) > 1)
union all
select 53, rowid, 'ind$.dataobj# range'
  from IND$
 where 1 >
       (select dataobj# from IND$ group by dataobj# having count(*) > 1)
union all
select 54, rowid, 'ind$.ts# fk'
  from IND$
 where (ts#) in (select ts#
                   from IND$
                  where (ts#) not in (select ts# from ts$)
                    and ts# != 2147483647)
union all
select 55, rowid, 'ind$.ts,file,block fk'
  from IND$
 where (ts#, file#, block#) in (select ts#, file#, block#
                                  from IND$
                                 where (ts#, file#, block#) not in
                                       (select ts#, file#, block# from seg$)
                                   and file# != 0
                                   and block# != 0)
union all
select 56, rowid, 'ind$.obj# fk_obj$'
  from IND$
 where (obj#) in
       (select obj# from IND$ where (obj#) not in (select obj# from obj$))

----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0xb1269160       191  package body SYS.DBMS_HM
0xb1d9f600         1  anonymous block

 

实际触发ORA-01427的是一条较长的递归SQL语句,该SQL由多个部分UNION ALL组合而成负责检测IND$基表是否存在逻辑不一致, 实际检测可以发现真真存在问题的是 这一段SQL:

 

select 53, rowid, 'ind$.dataobj# range'
  from IND$
 where 1 >
       (select dataobj# from IND$ group by dataobj# having count(*) > 1)

ERROR at line 4:
ORA-01427: single-row subquery returns more than one row

SQL>  select dataobj# from IND$ group by dataobj# having count(*) > 1;

  DATAOBJ#
----------

     75601
     75599
     75594
     75605

 

IND$ 基表上居然存在多条dataobj#重复的记录,我们来看看是哪些对象:

 

select /*+ first_rows */
 owner, object_name, data_object_id
  from dba_objects
 where data_object_id in
       (select dataobj# from IND$ group by dataobj# having count(*) > 1)
       order by 3 ;

OWNER                          OBJECT_NAME                    DATA_OBJECT_ID
------------------------------ ------------------------------ --------------
SYS                            SYS_C0010990                            75594
OE                             WHS_LOCATION_IX                         75594
OE                             ORD_CUSTOMER_IX                         75599
SYS                            SYS_IOT_TOP_75598                       75599
SYS                            SYS_IOT_TOP_75600                       75601
OE                             CUST_ACCOUNT_MANAGER_IX                 75601
OE                             PROD_SUPPLIER_IX                        75605
SYS                            SYS_IOT_TOP_75603                       75605

8 rows selected.

 

 

OE这个Sample Schema下的多个索引居然和SYS用户的一些索引的DATA_OBJECT_ID重号; 我们不可能去改动SYS下的对象,而OE这个Schema则无关紧要,删除这些OE下的问题索引:

 

SQL> drop index oe.WHS_LOCATION_IX;

Index dropped.

SQL> drop index oe.ORD_CUSTOMER_IX;

Index dropped.

SQL> drop index oe.CUST_ACCOUNT_MANAGER_IX;

Index dropped.

SQL> drop index oe.PROD_SUPPLIER_IX;

Index dropped.

SQL> select dataobj# from IND$ group by dataobj# having count(*) > 1;

  DATAOBJ#
----------

 

再次测试后成功执行Dictionary Integrity Check

 

SQL> exec dbms_hm.run_check('Dictionary Integrity Check','check-mac5');

PL/SQL procedure successfully completed.

SQL> set pause on;
SQL> spool dic_check
SQL> SET LONG 100000
SQL> SET LONGCHUNKSIZE 1000
SQL> SET PAGESIZE 100
SQL> SET LINESIZE 512
SQL> SELECT DBMS_HM.GET_RUN_REPORT('CHECK-MAC5') FROM DUAL;

DBMS_HM.GET_RUN_REPORT('CHECK-MAC5')
-----------------------------------------------------
Basic Run Information
 Run Name                     : check-mac5
 Run Id                       : 1301
 Check Name                   : Dictionary Integrity Check
 Mode                         : MANUAL
 Status                       : COMPLETED
 Start Time                   : 2012-04-30 09:33:28.540140 -04:00
 End Time                     : 2012-04-30 09:33:32.303679 -04:00
 Error Encountered            : 0
 Source Incident Id           : 0
 Number of Incidents Created  : 0

Input Paramters for the Run
 TABLE_NAME=ALL_CORE_TABLES
 CHECK_MASK=ALL

Run Findings And Recommendations
 Finding
 Finding Name  : Dictionary Inconsistency
 Finding ID    : 1302
 Type          : FAILURE
 Status        : OPEN
 Priority      : CRITICAL
 Message       : SQL dictionary health check: syn$.owner fk 95 on object SYN$
               failed
 Message       : Damaged rowid is AAAABEAABAAANWgAB7 - description: Synonymn
               APEX is referenced
 Finding
 Finding Name  : Dictionary Inconsistency
 Finding ID    : 1305
 Type          : FAILURE
 Status        : OPEN
 Priority      : CRITICAL
 Message       : SQL dictionary health check: syn$.owner fk 95 on object SYN$
               failed
 Message       : Damaged rowid is AAAABEAABAAANWhAAu - description: Synonymn
               APEXWS is referenced
 Finding
 Finding Name  : Dictionary Inconsistency
 Finding ID    : 1308
 Type          : FAILURE
 Status        : OPEN
 Priority      : CRITICAL
 Message       : SQL dictionary health check: syn$.owner fk 95 on object SYN$
               failed
 Message       : Damaged rowid is AAAABEAABAAANWgACO - description: Synonymn
               APEX_ACTIVITY_LOG is referenced
 Finding
 Finding Name  : Dictionary Inconsistency
 Finding ID    : 1311
 Type          : FAILURE
 Status        : OPEN
 Priority      : CRITICAL
 Message       : SQL dictionary health check: syn$.owner fk 95 on object SYN$
               failed
 Message       : Damaged rowid is AAAABEAABAAANWgABl - description: Synonymn
               APEX_ADMIN is referenced
 Finding
 Finding Name  : Dictionary Inconsistency
 Finding ID    : 1314
 Type          : FAILURE
 Status        : OPEN
 Priority      : CRITICAL
 Message       : SQL dictionary health check: syn$.owner fk 95 on object SYN$
               failed
 Message       : Damaged rowid is AAAABEAABAAANWgACB - description: Synonymn
               APEX_APPLICATION is referenced

 

 

这个case希望大家能了解的是对于ORA-00604这类递归SQL层的错误,报错信息本身给出的诊断信息是不完整的,需要我们通过一些工具来深入了解实际引发错误的是哪一条SQL语句,这些Recusive SQL出错的主要原因往往是BUG、或者数据字典存在不一致。如何在脱离MOS和SR帮助的情况下,安全地WorkAround绕过这个错误。

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号