OGG-01738 BOUNDED RECOVERY

2012-10-20 10:28:02  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 79286800, SCN: 0.3712874 (3712874), Timesta
mp: 2012-10-19 22:27:45.000000, Thread: 1, end=SeqNo: 343, RBA: 79287296, SCN: 0.3712874 (3712874), Timestamp: 2012-10-19 22:27:45.000000, Thread: 1.

2012-10-20 14:28:05  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 107000336, SCN: 0.3725744 (3725744), Timest
amp: 2012-10-20 02:27:14.000000, Thread: 1, end=SeqNo: 343, RBA: 107000832, SCN: 0.3725744 (3725744), Timestamp: 2012-10-20 02:27:14.000000, Thread: 1.

2012-10-20 18:28:06  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 128054288, SCN: 0.3739371 (3739371), Timest
amp: 2012-10-20 06:28:02.000000, Thread: 1, end=SeqNo: 343, RBA: 128054784, SCN: 0.3739371 (3739371), Timestamp: 2012-10-20 06:28:02.000000, Thread: 1.

2012-10-20 22:28:06  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 153368080, SCN: 0.3752583 (3752583), Timest
amp: 2012-10-20 10:27:46.000000, Thread: 1, end=SeqNo: 343, RBA: 153368576, SCN: 0.3752583 (3752583), Timestamp: 2012-10-20 10:27:46.000000, Thread: 1.

2012-10-21 02:28:08  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 165712912, SCN: 0.3763760 (3763760), Timest
amp: 2012-10-20 14:28:00.000000, Thread: 1, end=SeqNo: 343, RBA: 165713408, SCN: 0.3763760 (3763760), Timestamp: 2012-10-20 14:28:00.000000, Thread: 1.

2012-10-21 06:28:15  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 179789328, SCN: 0.3774866 (3774866), Timest

...skipping one line
2012-10-21 10:28:16  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 201859088, SCN: 0.3788193 (3788193), Timest
amp: 2012-10-20 22:26:32.000000, Thread: 1, end=SeqNo: 343, RBA: 201859584, SCN: 0.3788193 (3788193), Timestamp: 2012-10-20 22:26:32.000000, Thread: 1.

2012-10-21 14:28:26  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 246480912, SCN: 0.3803284 (3803284), Timest
amp: 2012-10-21 02:27:31.000000, Thread: 1, end=SeqNo: 343, RBA: 246481408, SCN: 0.3803284 (3803284), Timestamp: 2012-10-21 02:27:31.000000, Thread: 1.

2012-10-21 18:28:33  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 291493392, SCN: 0.3821051 (3821051), Timest
amp: 2012-10-21 06:28:22.000000, Thread: 1, end=SeqNo: 343, RBA: 291493888, SCN: 0.3821051 (3821051), Timestamp: 2012-10-21 06:28:22.000000, Thread: 1.

 

 

 

在Oracle GoldenGate版本11.x中,引入了Bounded Recovery(BR)的概念,即允许extract对于长事务(long running transaction 比BRINTERVAL指定值更长的事务)写入到本地BR目录。当extract重启时,它会首先读取BR文件,取而代之读取恢复检查点指定的归档日志,这样有助于提升性能以及减少对旧归档文件的依赖。

但是当在RAC环境中使用Bounded Recovery(BR)特性来恢复一个异常abend掉的extract的话,小概率可能会遇到extract hang住或丢失特性的事务。该BUG仅在RAC环境中或者单实例情况下使用多个thread设置时出现。

1. bug 10368242: transaction loss with BR
When a transaction is committed, it will be flushed to trail file. But when BR writing started (after the transaction commit) and extract abends abnormally, the extract may not have chance to flush the committed transaction to trail. When extract restarted, it will read from BR, and leave that committed transaction as persist committed transaction in memory and never be written to trail. So this committed transaction may be lost.

The problem will not happen when the extract stops in normal mode.

2.bug 12532428 (base bug 10408077 ): extract hung when using BR and new objects are added to extract
With BR setup, when new objects (table, sequence, DDL, et al) are including in the extract, restarted extract will pick up more data that causes the producer queue limit (a fixed number) used by BR be reached. Because the extract is still in BR recovery, the consumer thread is stopped and not processing data from the producer queues. This caused a deadlock, and the extract will appear hung.

解决方案

1. 对于BUG 12532428引起的事务丢失,该BUG在11.1.1.1中被修复,且会在11.1.1.0中被backport。

2. 对于BUG 10408077 引起的extract hang,该BUG在11.1.1.1和 11.1.1.0.30中被修复,也可以如下workaround绕过:

A workaround with earlier 11.1.1.0 version is to start extract with BRRESET, when new object is added to an extract. All the archived logs since recovery checkpoint need to be available.

 

ggsci> start extract, BRRESET

 

When running Oracle Golden Gate 11.1.1.0.6 or higher, extract is “abending” every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.

ERROR
———
2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: ggcrc64valid.

2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: br_validate_bcp.

2011-02-06 05:15:38 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p7186_Redo Thread 1.

2011-02-06 05:15:38 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 0.0.0.

2011-02-06 09:15:46 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p7186_Redo Thread 1: start=SeqNo: 21659, RBA: 117520912, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000, end=SeqNo: 21659, RBA: 117602816, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000.

Cause

Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.

Solution

The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:

GGSCI> start <extract> BRRESET

 

 

BOUNDED RECOVERY
错误信息:BOUNDED RECOVERY: reset to initial or altered checkpoint.
数据库问题,不能读取第2个节点的archivelog文件

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号