ORA-26786造成逻辑备库无法应用SQL一例

一套Linux x86-64 上的11.2.0.2逻辑备库因为出现ORA-26786: A row with key exists but has conflicting column(s) “导致APPLIER 进程无法继续工作，详细信息如下:

1) ### Primary OS Details ###
Hostname - vrh6
OS details -
[oracle@vrh6 ~]$ uname -a
Linux vrh6 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

2) ### Standby OS Details ###
Hostname - vrh7

[oracle@vrh7 sqlscripts]$ uname -a
Linux vrh7 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

DB Version - 11.2.0.2

3) ### Primary DB Details ###
11.2.0.2

4) ### Standby DB Details ###
11.2.0.2

5) ### Standby Configuration Type ###
Logical Standby(s) Only.

6) ### RAC Usage ###
Primary only uses RAC.

7) ### Documentation Reference ###
We configured logical standby successfully, but the log apply process on
standby is very slow and lagging by many archive log files

8) ### Data Guard Management Method ###
Data Guard GUI (Console/Grid Control).

9) ### Issue or Question Details ###
The logical standby was applying logs very slow, so raised the LCR cache size to 3058M.
After this the performance has improved, but we still see lag (about 3-4 hours)

现象是logical standby逻辑备库应用日志十分缓慢，加大LCR cache也无济于事。

该问题提交SR后，MOS要求通过以下脚本收集primary 与 logical standby的信息:

Script to Collect Data Guard Logical Standby Diagnostic Information (Doc ID 241512.1)
Script to Collect Data Guard Primary Site Diagnostic Information (Doc ID 241374.1)

I reviewed the trace file and could not see any errors or stuck SQL apply
, please refer to note:

Oracle10g Data Guard SQL Apply Troubleshooting (Doc ID 312434.1)
MAA - SQL Apply Best Practices 10gR2 (Doc ID 387450.1)

to verify the SQL Apply progress and review some tuning recommendation in note 387450.1

分析脚本运行结果以后发现逻辑备库的日志应用其实停止了：

The logical standby database in now almost stopped.In last two days it is not applying any changes,
but in log files it says " loading previously applied transaction information".
In the alert_log file either it is mining the same set of log files many times.

It has been mining the same files foew about 6 hours now. The logical standby is lagging production by 4 days. 

This morning i have enabled few parameters as per the document 10G bestpractices. Here are the parameters

Exec DBMS_LOGSTDBY.APPLY_SET('MAX_SERVERS',24);
ALTER SYSTEM SET PARALLEL_MAX_SERVERS=24; scope=BOTH;
Exec DBMS_LOGSTDBY.APPLY_SET('LOG_AUTO_DELETE', 'TRUE');

We are getting below error in logical standby. I am uploading the alert log file as well.

Tue Aug 30 02:22:04 2011
LOGSTDBY: SQL Apply about to stop with ORA-26786
Tue Aug 30 02:22:04 2011
LOGSTDBY: SQL Apply about to stop with ORA-26786
Tue Aug 30 02:22:04 2011
LOGSTDBY: SQL Apply about to stop with ORA-26786
Tue Aug 30 02:22:04 2011
LOGSTDBY: SQL Apply about to stop with ORA-26786
Tue Aug 30 02:22:04 2011
LOGSTDBY: SQL Apply about to stop with ORA-26786
LOGSTDBY: SQL Apply finished logging error information
LOGSTDBY Apply process AS02 server id=2 pid=37 OS id=23940 stopped
Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_as02_23940.trc:
ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has
conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table
BOM.CST_REVENUE_COGS_MATCH_LINES
LOGSTDBY: SQL Apply finished logging error information
LOGSTDBY Apply process AS05 server id=5 pid=40 OS id=23946 stopped
Tue Aug 30 02:22:05 2011
Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_lsp0_30249.trc:
ORA-26808: Apply process AS02 died unexpectedly.
ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has
conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table
BOM.CST_REVENUE_COGS_MATCH_LINES
Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_as05_23946.trc:
ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has
conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table
BOM.CST_REVENUE_COGS_MATCH_LINES

We have re-started the apply process yesterday and it was still doing
"ORA-16106: loading previously applied transaction information" .
When i see the alert.log file, it has been mining the same archive log files since yesterday
(more than 30 hours for now). My archive log files are piling up and the gap is widening.

10G best practices have been implemented yesterday.

sys@vrh7> ;
1* SELECT sid,SERIAL#,TYPE, LOGSTDBY_ID,STATUS_CODE, STATUS FROM V$LOGSTDBY_PROCESS
sys@vrh7> /

SID SERIAL# TYPE LOGSTDBY_ID STATUS_CODE STATUS
----- ---------- ------------------------------ ----------- -----------
754 9 COORDINATOR -1 16116 ORA-16116: no work available
1010 63 ANALYZER 0 16116 ORA-16116: no work available
1257 165 APPLIER 1 16116 ORA-16116: no work available
1509 51 APPLIER 2 16116 ORA-16116: no work available
1760 59 APPLIER 3 16116 ORA-16116: no work available
10 51 APPLIER 4 16116 ORA-16116: no work available
254 59 APPLIER 5 16116 ORA-16116: no work available
508 267 APPLIER 6 16116 ORA-16116: no work available
757 81 APPLIER 7 16116 ORA-16116: no work available
1009 53 APPLIER 8 16116 ORA-16116: no work available
1256 73 APPLIER 9 16116 ORA-16116: no work available
1510 55 APPLIER 10 16116 ORA-16116: no work available
1762 69 APPLIER 11 16116 ORA-16116: no work available
11 51 APPLIER 12 16116 ORA-16116: no work available
256 59 APPLIER 13 16116 ORA-16116: no work available
504 85 APPLIER 14 16116 ORA-16116: no work available
760 105 APPLIER 15 16116 ORA-16116: no work available
1006 63 APPLIER 16 16116 ORA-16116: no work available
1255 161 APPLIER 17 16116 ORA-16116: no work available
1511 51 APPLIER 18 16116 ORA-16116: no work available
1761 59 APPLIER 19 16116 ORA-16116: no work available
252 61 READER 1024 16242 ORA-16242: Processing log file (thread# 1, sequence# 11585)
510 67 BUILDER 1025 16119 ORA-16119: building transaction at SCN 0x0abb.d76aec86
1004 55 PREPARER 1026 16116 ORA-16116: no work available
5 61 PREPARER 1027 16116 ORA-16116: no work available

25 rows selected.

The logical standby is a week behind the primary database and the logmining is happening on the
same files when we re-started the standby apply. With this speed it never catch up and clear
backlog arch files. I see some data descriphancy errors in the alert log file on table
BOM.CST_REVENUE_COGS_MATCH_LINES. The apply is not updating many of the records in this
table compalining "ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC
_COST_TYPE_ID") = (84243, ) exists but has conflic
ting column(s) "ORIGINAL_SHIPPED_QTY" in table BOM
.CST_REVENUE_COGS_MATCH_LINES"

I ran logical standby diag script and uploading the output. Please do the needful ASAP as
this is directly impacting the revenue/business loss.

MOS 最后给出的建议:

Looks like you identify the issue, you may want to skip the stuck table or transactions
and re-instantiate the table when both databases are synced, please refer to the below notes:

Oracle10g Data Guard SQL Apply Troubleshooting (Doc ID 312434.1)

Handling ORA-1403 ora-12801 on logical standby apply (Doc ID 1178284.1)

Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.8 to 11.2.0.1.0 - Release: 9.2 to 11.2
Information in this document applies to any platform.
Purpose
Logical standby apply may report ora-1403: No data found during apply.
This note explains how to check the transaction that is failing and ways to fix the same problematic transaction.

Sample Alert log contents
==================

LOGMINER: Begin mining logfile: D:\ORADATA\LOGI\SREDO05.LOG
LOGSTDBY Apply process P012 started with pid=50 OS id=5300
LOGMINER: session# = 41, builder process P003 started with pid=41 OS id=4740
Tue Aug 17 09:53:08 2010
LOGSTDBY status: ORA-01403: no data found
LOGSTDBY Apply process P006 pid=44 OS id=2892 stopped
Tue Aug 17 09:53:08 2010
Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_2144.trc:
ORA-12801: error signaled in parallel query server P006
ORA-01403: no data found
LOGSTDBY Apply process P006 pid=44 OS id=2892 stopped
Tue Aug 17 09:53:08 2010
Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_2144.trc:
ORA-12801: error signaled in parallel query server P006
ORA-01403: no data found
Last Review Date
August 18, 2010
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, 
diagnostic tools are included in the document to assist in troubleshooting.
Troubleshooting Details
ORA-1403 is mostly caused due some of below reasons

1) Skip rule defined on DML operations earlier in Logical standby database that cause the 
data mismatch between PRIMARY and STANDBY database.
2) User modified the data directly in Logical standby database by setting database guard status OFF.

Solution:1
**************

1) Identify the failing sql statement using DBA_LOGSTDBY_EVENTS from logical standby database

SQL> select XIDUSN, XIDSLT, XIDSQN , status , event from dba_logstdby_events order by event_time;

XIDUSN XIDSLT XIDSQN STATUS EVENT
---------- ---------- ---------- ------------------------------ -------------------------------
5 22 1826 ORA-01403: no data found delete from "SHANKAR"."TEST1" where "ID" = 1 and
"ENAME" = 'shankar' and ROWID = 'AAAAAAAABAAAJHaAAA'

You should not compare the ROWID of PRIMARY & STANDBY database as it will be complete 
different and sql apply will not check the row based on ROWID that is shown in the sql statement above.

2) Skip the failing transaction from logical standby database.

Execute the procedure dbms_lostdby.skip_transaction(XIDUSN , XIDSLT ,XIDSQN) of the failing transaction.

SQL> exec dbms_logstdby.skip_transaction (5,22,1826);

3) Restart the sql apply.
SQL> alter database start logical standby apply immediate;
If using broker
DGMGRL>edit database logi set state='online';

Solution:2
*************
If the data mismatch is huge that may cause repeated ora-1403 on same table then its 
advisable to instantiate the complete table from PRIMARY to STANDBY.

TableName : TEST1
SchemaName : SHANKAR

1. At primary, as sys user grant below roles to the user whose table(s) is being reinstantiated

SQL> grant SELECT_CATALOG_ROLE to SHANKAR;
SQL> grant EXP_FULL_DATABASE to SHANKAR;
SQL> grant IMP_FULL_DATABASE to SHANKAR;

2. As sys user, create a connected user database link at logical standby.

SQL> create public database link synctable connect to shankar identified by shankar using 'prim';

database link created.

prim - is the service name in TNSNAMES.ORA that points to primary database.

3. verify the database link from logical standby to ensure that it gets info from primary

SQL> select db_unique_name,database_role from v$database@synctable;

DB_UNIQUE_NAME DATABASE_ROLE
------------------------------ ----------------
PRIM PRIMARY

4. Stop the sql apply if it is already on. You can skip this step otherwise

SQL> select count(*) from v$logstdby; ## will result zero rows if sql apply is not running

SQL> alter database stop logical standby apply;
If using BROKER
DGMGRL> edit database logi set state='log-apply-off';

5. Remove the skip rules in LOGICAL standby database, if there is any, from the table that is being instantiated

SQL> select STATEMENT_OPT,NAME from DBA_LOGSTDBY_SKIP where OWNER='SHANKAR';
SQL> exec dbms_logstdby.unskip('SCHEMA_DDL', 'SHANKAR','TEST1');
SQL> exec dbms_logstdby.unskip('DML', 'SHANKAR','TEST1');

6. From logical standby ,start instantiating the the table as sys user

SQL> exec dbms_logstdby.instantiate_table('SHANKAR','TEST1','SYNCTABLE')
Note:- Ensure that the Schema Name, Table Name and dblink name are in Uppercase .

7. Once you are done with instantiation, resume sql apply

SQL> alter database start logical standby apply immediate;
If using BROKER
DGMGRL> edit database logi set state='online';

8.At primary, Revoke the roles granted to the application user - SHANKAR

SQL> revoke SELECT_CATALOG_ROLE from SHANKAR;
SQL> revoke EXP_FULL_DATABASE from SHANKAR;
SQL> revoke IMP_FULL_DATABASE from SHANKAR;

Solution:3
*************

If the sql apply fails even though the data is in sync, you may need to log a service 
request with oracle support with the below events set in Logical standby and the issue being reproduced.

1) In logical standby, stop the sql apply

sql> alter database stop logical standby apply;
if using broker
DGMGRL> edit database logi set state='log-apply-off';

2) Set the below events in logical standby (you may backup and remove all the files in 
/bdump so that you can zip and upload all the files generated in /bdump to Oracle Support)

SQL> alter system set max_dump_file_size=unlimited;

SQL>alter system set events '1349 trace name context forever, level 4095';
SQL> alter system set events '16300 trace name context forever, level 15';
SQL> alter system set events '26700 trace name context forever, level 1544';
SQL>alter system set events '10308 trace name context forever, level 8';

SQL>alter system set events '1403 trace name errorstack level 3';

3) Start the sql apply

sql> alter database start logical standby apply immediate;
if using broker

DGMGRL> edit database logi set state='online';

4) Monitor the alert log to ensure issue is reproduced.

Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_4532.trc:
ORA-12801: error signaled in parallel query server P006
ORA-01403: no data found

Log a service request and upload the alert log and trace files generated in /bdump folder

5) Turn off the trace events set earlier.

SQL>alter system set events '1349 trace name context off';
SQL>alter system set events '16300 trace name context off';
SQL>alter system set events '26700 trace name context off';
SQL>alter system set events '10308 trace name context off';
SQL>alter system set events '1403 trace name context off';

6. End

ORA-26786造成逻辑备库无法应用SQL一例

Comments

Comment 取消回复