Examine 11g automatic block Corruption recovery

11g的高可用框架中DataGuard为我们带来了大量有用的特性,最引入注目的显然是Active Data Guard,这一特性几乎彻底改观了Oracle HA的原有格局。除了Active Data Guard外Automatic Block Media Repair即自动的块介质恢复也是11g中数据卫士一个不容忽视的特色。该特性通过后台进程ABMR自动将物理备库(physical standby)上的健康数据块传输到主库(primary database)上以替换在主库发现的已损坏的数据块。同样的若物理备库上发现数据块损坏那么也可以利用到以上特性来修复。注意使用该特性无需额外设置db_lost_write_protect参数为非默认的NONE值,ABMR的自动修复不依赖于该参数。

深入研究的话可以发现实际控制Automatic Block Media Repair特性的是一系列隐藏参数,它们包括:

_auto_bmr enabled enable/disable Auto BMR
_auto_bmr_req_timeout 60 Auto BMR Requester Timeout
_auto_bmr_sess_threshold 30 Auto BMR Request Session Threshold
_auto_bmr_pub_timeout 10 Auto BMR Publish Timeout
_auto_bmr_fc_time 60 Auto BMR Flood Control Time
_auto_bmr_bg_time 3600 Auto BMR Process Run Time
_auto_bmr_sys_threshold 100 Auto BMR Request System Threshold

显然这里面_auto_bmr隐藏参数是ABMR特性的开关,其默认值为enabled,而其他参数则定义了abmr的超时和限定阀值,这里不做展开。

这里我们来实地体验一下这种高可用的block repair特性:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE	11.2.0.2.0	Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

/* 演示所用数据库版本为较新的11.2.0.2 */

SQL> conn maclean/maclean
Connected.

SQL> create tablespace abmr datafile size 10M;
Tablespace created.

SQL> select file_name from dba_data_files where tablespace_name='ABMR';

FILE_NAME
--------------------------------------------------------------------------------
/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf

SQL>  create table need_repair tablespace abmr  as select 1 t1 from dual;
Table created.

SQL> exec dbms_stats.gather_table_stats('MACLEAN','NEED_REPAIR');
PL/SQL procedure successfully completed.

SQL> select * from need_repair;
	T1
----------
	 1

SQL>  select dbms_rowid.rowid_block_number(rowid) from need_repair;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
				 131

SQL> alter system flush buffer_cache;
System altered.

SQL> alter system flush buffer_cache;
System altered.

[maclean@rh6 ~]$ bbed FILENAME=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf mode=edit
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Sat Mar 26 20:42:25 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set block 131
	BLOCK#         	131

BBED> corrupt block 131
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
Block marked media corrupt.

/* 以上我们使用bbed工具将need_repair表上唯一一行数据所在的数据块损坏 
   这样如果没有11g中automatic block Corruption recovery特性的话,
   应当报ORA-01578:ORACLE data block corrupted错误
*/

SQL> conn maclean/maclean
Connected.
SQL> select * from need_repair;

	T1
----------
	 1

/* 以上查询并未出错,显然已经通过后台调用ABMR进程修复了该数据块 */

/* 告警日志中记录了ABMR的修复过程 */

Hex dump of (file 5, block 131) in trace file /s01/diag/rdbms/prod/PROD/trace/PROD_ora_24289.trc
Corrupt block relative dba: 0x01400083 (file 5, block 131)
Bad header found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00000083
 last change scn: 0x0000.00000000 seq: 0xff flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x000006ff
 check value in block header: 0x154d
 computed block checksum: 0x0
Reading datafile '/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf' for corruption at rdba: 0x01400083 (file 5, block 131)
Reread (file 5, block 131) found same corrupt data (no logical check)

/* 以上dedicated server process 2次从disk读取该块都发现损坏后,启动了后台进程ABMR,
   在实例启动时abmr进程并不随instance启动,仅当需要时被服务进程启动 
 */

Starting background process ABMR
ABMR started with pid=33, OS id=24293 
Automatic block media recovery service is active.
Automatic block media recovery requested for (file# 5, block# 131)
Automatic block media recovery successful for (file# 5, block# 131)
Automatic block media recovery successful for (file# 5, block# 131)
WARNING: AutoBMR fixed mismatched on-disk single block 83 with in-mem rdba 1400083.

/* 同时abmr不是fatal的后台进程,杀死该进程不会导致不良影响 */

[maclean@rh6 ~]$ ps -ef|grep abmr
maclean  24293     1  0 20:43 ?        00:00:00 ora_abmr_PROD
maclean  24390 22254  0 20:49 pts/0    00:00:00 grep abmr
[maclean@rh6 ~]$ kill -9 24293

/* 如果不想使用Automatic Block Media Repair特性,抑或者因为该特性出现了一些问题的话,
   设置_auto_bmr为diabled即禁用该特性一般可以帮助我们绕过问题 */


SQL> alter system set "_auto_bmr"=disabled;
System altered.

SQL> select * from need_repair;
select * from need_repair
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 131)
ORA-01110: data file 5:
'/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf'

/* 如预期的出现了ORA-01578错误 */

相关的告警日志内容 :
Hex dump of (file 5, block 131) in trace file /s01/diag/rdbms/prod/PROD/trace/PROD_ora_24742.trc
Corrupt block relative dba: 0x01400083 (file 5, block 131)
Bad header found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00000083
 last change scn: 0x0000.00000000 seq: 0xff flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x000006ff
 check value in block header: 0x154d
 computed block checksum: 0x0
Reading datafile '/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf' for corruption at rdba: 0x01400083 (file 5, block 131)
Reread (file 5, block 131) found same corrupt data (no logical check)
Corrupt Block Found
         TSN = 5, TSNAME = ABMR
         RFN = 5, BLK = 131, RDBA = 20971651
         OBJN = 13773, OBJD = 13773, OBJECT = NEED_REPAIR, SUBOBJECT = 
         SEGMENT OWNER = MACLEAN, SEGMENT TYPE = Table Segment
Errors in file /s01/diag/rdbms/prod/PROD/trace/PROD_ora_24742.trc  (incident=5081):
ORA-01578: ORACLE data block corrupted (file # 5, block # 131)
ORA-01110: data file 5: '/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf'
Incident details in: /s01/diag/rdbms/prod/PROD/incident/incdir_5081/PROD_ora_24742_i5081.trc
2011-03-26 21:08:18.718000 +08:00
Sweep [inc][5081]: completed
Hex dump of (file 5, block 131) in trace file /s01/diag/rdbms/prod/PROD/incident/incdir_5081/PROD_m000_24753_i5081_a.trc
Corrupt block relative dba: 0x01400083 (file 5, block 131)
Bad header found during validation
Data in bad block:
 type: 6 format: 2 rdba: 0x00000083
 last change scn: 0x0000.00000000 seq: 0xff flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x000006ff
 check value in block header: 0x154d
 computed block checksum: 0x0
Reread of blocknum=131, file=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf. found same corrupt data
Reread of blocknum=131, file=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf. found same corrupt data
Reread of blocknum=131, file=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf. found same corrupt data
Reread of blocknum=131, file=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf. found same corrupt data
Reread of blocknum=131, file=/standby/oradata/PROD/datafile/o1_mf_abmr_6rvqp087_.dbf. found same corrupt data
Checker run found 1 new persistent data failures
Dumping diagnostic data in directory=[cdmp_20110326210819], requested by (instance=1, osid=24742), summary=[incident=5081].
2011-03-26 21:08:21.458000 +08:00
Sweep [inc2][5081]: completed

to be continued …………

Duplicate standby database from active database

11g Release1 中引入了新的RMAN duplicate命令,即duplicate from active database命令。利用该命令可以更加便捷地创建Data Guard环境,你甚至不需要将Primary Database shutdown(整个过程中主库都可以处于打开状态下),也不需要在配置前做一些额外的备份操作,仅需要配置起auxiliary辅助实例,同时创建密码文件,并在监听(LISTENER)中加入静态注册信息后就可以开始工作了!

以下示例中我们会在Primary Database所在的同一台服务器上部署一套物理备库(Physical Standby),实际上这样比在不同的2台服务器上配置Data Guard要复杂些,原因是那样的话我们无需做File Name的转换,具体的环境:

Database Role DB_UNIQUE_NAME
Primary Database PROD
Standby Database SBDB

 

/* 需要注意的是DG环境中force logging,不要用了新特性就将这个基本的要求忘记了 ! */

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE	11.2.0.2.0	Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

/* 显然PROD主库应当处于归档模式并且是FORCE LOGGING状态下 */

SQL> alter database force logging;
Database altered.

同时在Primary Database上创建必要的standby logfile:

SQL> alter database add standby logfile group 7 size 50M;
Database altered.

SQL> alter database add standby logfile group 8 size 50M;
Database altered.
 ..............

/* standby实例只需要db_name参数就可以启动到nomount模式了,
   并为standby数据库创建密码文件 */

[maclean@rh6 ~]$ echo "db_name=PROD" >$ORACLE_HOME/dbs/initSBDB.ora

[maclean@rh6 ~]$ orapwd file=$ORACLE_HOME/dbs/orapwSBDB password=XXXX entries=10

[maclean@rh6 ~]$ cd $ORACLE_HOME/network/admin

我们需要保证PROD和SBDB实例在监听器Listener中被静态注册,同时也包括DGMGRL需要用到的*_DGMGRL服务名

[maclean@rh6 admin]$ cat listener.ora
# listener23920.ora Network Configuration File: /tmp/listener23920.ora
# Generated by Oracle configuration tools.

DGLSN =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rh6.oracle.com)(PORT = 1588))
  )

SID_LIST_DGLSN =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = PROD)
      (ORACLE_HOME = /s01/product/11.2.0/dbhome_2)
      (SID_NAME = PROD)
    )
    (SID_DESC =
      (GLOBAL_DBNAME = SBDB)
      (ORACLE_HOME = /s01/product/11.2.0/dbhome_2)
      (SID_NAME = SBDB)
    )
    (SID_DESC =
      (GLOBAL_DBNAME = PROD_DGMGRL)
      (ORACLE_HOME = /s01/product/11.2.0/dbhome_2)
      (SID_NAME = PROD)
    )
    (SID_DESC =
      (GLOBAL_DBNAME = SBDB_DGMGRL)
      (ORACLE_HOME = /s01/product/11.2.0/dbhome_2)
      (SID_NAME = SBDB)
    )
  )

/* 并在tnsnames.ora文件中加入必要的服务别名 */

[maclean@rh6 admin]$ cat tnsnames.ora
# tnsnames.ora Network Configuration File: /s01/product/11.2.0/dbhome_2/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

PROD =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rh6.oracle.com)(PORT = 1588))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = PROD)
    )
  )

SBDB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rh6.oracle.com)(PORT = 1588))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SBDB)
    )
  )

DGLSN =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rh6.oracle.com)(PORT = 1588))
  )

接着我们来启动辅助实例

[maclean@rh6 ~]$ export ORACLE_SID=SBDB
[maclean@rh6 ~]$ sqlplus  / as sysdba
SQL> startup  nomount;
ORACLE instance started.

/* 使用远程登录测试是否可以以SYSDBA登录SBDB实例 */

[maclean@rh6 ~]$ sqlplus  sys/XXXXXX@SBDB as sysdba

下面我们需要准备一份duplicate standby database的脚本,
因为我们是在同一台主机上配置Data Guard所以这里要用到db_file_name_convert和log_file_name_convert,
如果你的环境中不需要这么做那么你可以指定NOFILENAMECHECK;

以下为示例的脚本,可以看到它并不复杂只是指定了必要的初始化参数,十分易于记忆。

[maclean@rh6 ~]$ cat duplicate_act_standby.rcv 

duplicate target database
for standby
from active database
DORECOVER
spfile
set db_unique_name='SBDB'
set log_archive_dest_1='location=/standby/arch02'
set log_archive_dest_2='service=PROD async lgwr register valid_for=(online_logfile,primary_role) db_unique_name=PROD'
set standby_file_management='AUTO'
set fal_server='PROD'
set fal_client='SBDB'
set control_files='/standby/oradata/SBDB/controlfile/control1.ctl','/standby/oradata/SBDB/controlfile/control2.ctl'
set db_file_name_convert='PROD','SBDB'
set log_file_name_convert='PROD','SBDB'
set memory_target='0'
set sga_target='400M';

具体执行以上脚本,我们需要同时登录target database PROD和auxiliary instance辅助实例SBDB:

[maclean@rh6 ~]$ echo $ORACLE_SID
PROD

[maclean@rh6 ~]$ rman target / auxiliary sys/oracle@SBDB cmdfile=duplicate_act_standby.rcv log=das.log

connected to target database: PROD (DBID=158660885)
connected to auxiliary database: PROD (not mounted)

RMAN> duplicate target database
2> for standby
3> from active database
4> DORECOVER
5> spfile
6> set db_unique_name='SBDB'
7> set log_archive_dest_1='location=/standby/arch02'
8> set log_archive_dest_2='service=PROD async lgwr register valid_for=(online_logfile,primary_role) db_unique_name=PROD'
9> set standby_file_management='AUTO'
10> set fal_server='PROD'
11> set fal_client='SBDB'
12> set control_files='/standby/oradata/SBDB/controlfile/control1.ctl','/standby/oradata/SBDB/controlfile/control2.ctl'
13> set db_file_name_convert='PROD','SBDB'
14> set log_file_name_convert='PROD','SBDB'
15> set memory_target='0'
16> set sga_target='400M';
17>
Starting Duplicate Db at 26-MAR-11
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=96 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   targetfile  '/s01/product/11.2.0/dbhome_2/dbs/orapwPROD' auxiliary format
 '/s01/product/11.2.0/dbhome_2/dbs/orapwSBDB'   targetfile
 '/s01/product/11.2.0/dbhome_2/dbs/spfilePROD.ora' auxiliary format
 '/s01/product/11.2.0/dbhome_2/dbs/spfileSBDB.ora'   ;
   sql clone "alter system set spfile= ''/s01/product/11.2.0/dbhome_2/dbs/spfileSBDB.ora''";
}
executing Memory Script

Starting backup at 26-MAR-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=17 device type=DISK
Finished backup at 26-MAR-11

sql statement: alter system set spfile= ''/s01/product/11.2.0/dbhome_2/dbs/spfileSBDB.ora''

contents of Memory Script:
{
   sql clone "alter system set  db_unique_name =
 ''SBDB'' comment=
 '''' scope=spfile";
   sql clone "alter system set  log_archive_dest_1 =
 ''location=/standby/arch02'' comment=
 '''' scope=spfile";
   sql clone "alter system set  log_archive_dest_2 =
 ''service=PROD async lgwr register valid_for=(online_logfile,primary_role) db_unique_name=PROD'' comment=
 '''' scope=spfile";
   sql clone "alter system set  standby_file_management =
 ''AUTO'' comment=
 '''' scope=spfile";
   sql clone "alter system set  fal_server =
 ''PROD'' comment=
 '''' scope=spfile";
   sql clone "alter system set  fal_client =
 ''SBDB'' comment=
 '''' scope=spfile";
   sql clone "alter system set  control_files =
 ''/standby/oradata/SBDB/controlfile/control1.ctl'', ''/standby/oradata/SBDB/controlfile/control2.ctl'' comment=
 '''' scope=spfile";
   sql clone "alter system set  db_file_name_convert =
 ''PROD'', ''SBDB'' comment=
 '''' scope=spfile";
   sql clone "alter system set  log_file_name_convert =
 ''PROD'', ''SBDB'' comment=
 '''' scope=spfile";
   sql clone "alter system set  memory_target =
 0 comment=
 '''' scope=spfile";
   sql clone "alter system set  sga_target =
 400M comment=
 '''' scope=spfile";
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

sql statement: alter system set  db_unique_name =  ''SBDB'' comment= '''' scope=spfile

sql statement: alter system set  log_archive_dest_1 =  ''location=/standby/arch02'' comment= '''' scope=spfile

sql statement: alter system set  log_archive_dest_2 =  
''service=PROD async lgwr register valid_for=(online_logfile,primary_role) db_unique_name=PROD'' 
comment= '''' scope=spfile

sql statement: alter system set  standby_file_management =  ''AUTO'' comment= '''' scope=spfile

sql statement: alter system set  fal_server =  ''PROD'' comment= '''' scope=spfile

sql statement: alter system set  fal_client =  ''SBDB'' comment= '''' scope=spfile

sql statement: alter system set  control_files =  ''/standby/oradata/SBDB/controlfile/control1.ctl'',
''/standby/oradata/SBDB/controlfile/control2.ctl''
comment= '''' scope=spfile

sql statement: alter system set  db_file_name_convert =  ''PROD'', ''SBDB'' comment= '''' scope=spfile

sql statement: alter system set  log_file_name_convert =  ''PROD'', ''SBDB'' comment= '''' scope=spfile

sql statement: alter system set  memory_target =  0 comment= '''' scope=spfile

sql statement: alter system set  sga_target =  400M comment= '''' scope=spfile

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area     417546240 bytes

Fixed Size                     2227072 bytes
Variable Size                134218880 bytes
Database Buffers             272629760 bytes
Redo Buffers                   8470528 bytes

contents of Memory Script:
{
   backup as copy current controlfile for standby auxiliary format  '/standby/oradata/SBDB/controlfile/control1.ctl';
   restore clone controlfile to  '/standby/oradata/SBDB/controlfile/control2.ctl' from
 '/standby/oradata/SBDB/controlfile/control1.ctl';
}
executing Memory Script

Starting backup at 26-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
copying standby control file
output file name=/s01/product/11.2.0/dbhome_2/dbs/snapcf_PROD.f tag=TAG20110326T195144 RECID=2 STAMP=746826704
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 26-MAR-11

Starting restore at 26-MAR-11
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=134 device type=DISK

channel ORA_AUX_DISK_1: copied control file copy
Finished restore at 26-MAR-11

contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database

contents of Memory Script:
{
   set newname for tempfile  1 to
 "/standby/oradata/SBDB/datafile/o1_mf_temp_6rvjsmr4_.tmp";
   switch clone tempfile all;
   set newname for datafile  1 to
 "/standby/oradata/SBDB/datafile/o1_mf_system_6rvjrtxh_.dbf";
   set newname for datafile  2 to
 "/standby/oradata/SBDB/datafile/o1_mf_sysaux_6rvjs6vh_.dbf";
   set newname for datafile  3 to
 "/standby/oradata/SBDB/datafile/o1_mf_undotbs1_6rvjsjjg_.dbf";
   set newname for datafile  4 to
 "/standby/oradata/SBDB/datafile/o1_mf_users_6rvjsy5q_.dbf";
   backup as copy reuse
   datafile  1 auxiliary format
 "/standby/oradata/SBDB/datafile/o1_mf_system_6rvjrtxh_.dbf"   datafile
 2 auxiliary format
 "/standby/oradata/SBDB/datafile/o1_mf_sysaux_6rvjs6vh_.dbf"   datafile
 3 auxiliary format
 "/standby/oradata/SBDB/datafile/o1_mf_undotbs1_6rvjsjjg_.dbf"   datafile
 4 auxiliary format
 "/standby/oradata/SBDB/datafile/o1_mf_users_6rvjsy5q_.dbf"   ;
   sql 'alter system archive log current';
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to /standby/oradata/SBDB/datafile/o1_mf_temp_6rvjsmr4_.tmp in control file

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting backup at 26-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=/standby/oradata/PROD/datafile/o1_mf_system_6rvjrtxh_.dbf
output file name=/standby/oradata/SBDB/datafile/o1_mf_system_6rvjrtxh_.dbf tag=TAG20110326T195152
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=/standby/oradata/PROD/datafile/o1_mf_sysaux_6rvjs6vh_.dbf
output file name=/standby/oradata/SBDB/datafile/o1_mf_sysaux_6rvjs6vh_.dbf tag=TAG20110326T195152
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=/standby/oradata/PROD/datafile/o1_mf_undotbs1_6rvjsjjg_.dbf
output file name=/standby/oradata/SBDB/datafile/o1_mf_undotbs1_6rvjsjjg_.dbf tag=TAG20110326T195152
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=/standby/oradata/PROD/datafile/o1_mf_users_6rvjsy5q_.dbf
output file name=/standby/oradata/SBDB/datafile/o1_mf_users_6rvjsy5q_.dbf tag=TAG20110326T195152
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 26-MAR-11

sql statement: alter system archive log current

contents of Memory Script:
{
   backup as copy reuse
   archivelog like  "/standby/arch01/1_17_746822549.dbf" auxiliary format
 "/standby/arch02/1_17_746822549.dbf"   ;
   catalog clone archivelog  "/standby/arch02/1_17_746822549.dbf";
   switch clone datafile all;
}
executing Memory Script

Starting backup at 26-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log copy
input archived log thread=1 sequence=17 RECID=3 STAMP=746826751
output file name=/standby/arch02/1_17_746822549.dbf RECID=0 STAMP=0
channel ORA_DISK_1: archived log copy complete, elapsed time: 00:00:01
Finished backup at 26-MAR-11

cataloged archived log
archived log file name=/standby/arch02/1_17_746822549.dbf RECID=1 STAMP=746826752

datafile 1 switched to datafile copy
input datafile copy RECID=2 STAMP=746826752 file name=/standby/oradata/SBDB/datafile/o1_mf_system_6rvjrtxh_.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=3 STAMP=746826752 file name=/standby/oradata/SBDB/datafile/o1_mf_sysaux_6rvjs6vh_.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=4 STAMP=746826752 file name=/standby/oradata/SBDB/datafile/o1_mf_undotbs1_6rvjsjjg_.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=5 STAMP=746826752 file name=/standby/oradata/SBDB/datafile/o1_mf_users_6rvjsy5q_.dbf

contents of Memory Script:
{
   set until scn  242517;
   recover
   standby
   clone database
    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 26-MAR-11
using channel ORA_AUX_DISK_1

starting media recovery

archived log for thread 1 with sequence 17 is already on disk as file /standby/arch02/1_17_746822549.dbf
archived log file name=/standby/arch02/1_17_746822549.dbf thread=1 sequence=17
media recovery complete, elapsed time: 00:00:00
Finished recover at 26-MAR-11
Finished Duplicate Db at 26-MAR-11

Recovery Manager complete.

[maclean@rh6 ~]$ export ORACLE_SID=PROD
[maclean@rh6 ~]$ sqlplus  / as sysdba

/* 在主库PROD上设置到物理备库SBDB的归档目的地 */

SQL> alter system set log_archive_dest_2='service=sbdb lgwr async  
valid_for=(online_logfiles,primary_role) db_unique_name=SBDB';
System altered.

以上完成了对物理备库Physical Standby的配置,紧接着我们来配合Data Broker:

SQL> alter system set dg_broker_start=true;
System altered.

[maclean@rh6 ~]$ export ORACLE_SID=SBDB
[maclean@rh6 ~]$ sqlplus  / as sysdba

SQL>  alter system set dg_broker_start=true;
System altered.

[maclean@rh6 ~]$ dgmgrl sys/oracle@PROD
DGMGRL for Linux: Version 11.2.0.2.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.

DGMGRL>  create CONFIGURATION PROD as PRIMARY DATABASE IS PROD  CONNECT IDENTIFIER IS PROD;
Configuration "prod" created with primary database "prod"

DGMGRL> add database sbdb  AS CONNECT IDENTIFIER IS sbdb MAINTAINED AS PHYSICAL;
Database "sbdb" added

DGMGRL> enable configuration;
Enabled.

DGMGRL> edit database prod set property LogXptMode='sync';
Property "logxptmode" updated
DGMGRL> edit database sbdb set property LogXptMode='sync';
Property "logxptmode" updated

/* 修改当前DG的保护模式为最大可用模式MaxAvailability */

DGMGRL> edit CONFIGURATION SET PROTECTION MODE  as MaxAvailability;
Succeeded.

DGMGRL> show configuration;

Configuration - prod

  Protection Mode: MaxAvailability
  Databases:
    prod - Primary database
    sbdb - Physical standby database
      Error: ORA-16766: Redo Apply is stopped

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

/* 以上可以看到物理备库SBDB上的REDO APPLY被停止了,我们可以在DGMGRL中启动其REDO APPLY */

DGMGRL> edit database sbdb set state='APPLY-ON';
Succeeded.

DGMGRL> show configuration;

Configuration - prod

  Protection Mode: MaxAvailability
  Databases:
    prod - Primary database
    sbdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

/* 以下使用DGMGRL来回切换主备库角色,十分方便 */

DGMGRL> switchover to sbdb;
Performing switchover NOW, please wait...
New primary database "sbdb" is opening...
Operation requires shutdown of instance "PROD" on database "prod"
Shutting down instance "PROD"...
ORACLE instance shut down.
Operation requires startup of instance "PROD" on database "prod"
Starting instance "PROD"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "sbdb"

DGMGRL> switchover to prod;
Performing switchover NOW, please wait...
New primary database "prod" is opening...
Operation requires shutdown of instance "SBDB" on database "sbdb"
Shutting down instance "SBDB"...
ORACLE instance shut down.
Operation requires startup of instance "SBDB" on database "sbdb"
Starting instance "SBDB"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "prod"

11g新特性:Rolling Upgrade With Physical Standby

从Oracle 10.1.0.3开始引入了利用逻辑备库(logical standby)实施滚动升级(rolling upgrade)的特性;在滚动升级期间(rolling upgrade),允许主库(primary database)与逻辑备库(logical standby)间运行不同的数据库版本,以最小化宕机时间。

到了11g中增加了可以将物理备库(physical standby)临时性转换成逻辑备库(logical standby)以完成滚动升级,之后将该临时逻辑备库反转为物理备库的功能。使用以上临时转换功能只需要在转换语句”ALTER DATABASE RECOVER TO LOGICAL STANDBY”后加上”KEEP IDENTITY”选项。除去转换细节的区别外,使用物理备库进行滚动升级的过程与10g中的逻辑备库滚动升级没有太大的区别。

这里我们通过实验来亲身体验一下这一新特性,我们假设环境中存在一套11.1.0.7的Data Guard系统:

Database Role DB_UNIQUE_NAME Version
Primary Database PROD 11.1.0.7
Physical Standby SBDB2 11.1.0.7

当前的需求是将生产数据库升级到11.2.0.2,但要求最小化应用downtime;这里我们就可以充分利用Data Guard的环境借势升级,同时也不会破坏原有的HA可用性:

1.滚动升级前需要为Primary Database做必要的准备工作,这里如果数据库之前没有启用闪回数据库的话需要enable

SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> ALTER DATABASE FLASHBACK ON;

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
YES

SQL> ALTER DATABASE OPEN;

/* 同时创建保证还原点以保证升级失败时可以回退 */

SQL> create restore point pre_rolling_upgrd guarantee flashback database;

2.将物理备库临时转换为逻辑备库,不同于普通的转换这里要用KEEP IDENTITY

SQL> shutdown immediate;
SQL> startup mount;

之后需要在Primary Database中创建Logminer dictionary日志挖掘字典:
SQL> EXECUTE DBMS_LOGSTDBY.BUILD;

以上步骤完成后可以开始将physical standby转换为logical standby了:
SQL> ALTER DATABASE RECOVER TO LOGICAL STANDBY KEEP IDENTITY;

SQL> alter database open;

以下为转换期间的日志:
ALTER DATABASE RECOVER TO LOGICAL STANDBY KEEP IDENTITY
Media Recovery Start: Managed Standby Recovery (SBDB2)
Fast Parallel Media Recovery enabled
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 2 processes
Media Recovery Waiting for thread 1 sequence 184
2011-03-25 19:36:51.012000 +08:00
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
kcrrvslf: active RFS archival for log 8 thread 1 sequence 188
RFS[3]: Successfully opened standby log 7: '/standby/oradata/SBDB2/onlinelog/o1_mf_7_6qf9b6cw_.log'
2011-03-25 19:40:43.074000 +08:00
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
RFS[3]: Successfully opened standby log 8: '/standby/oradata/SBDB2/onlinelog/o1_mf_8_6qf9b8j4_.log'
Media Recovery Log /standby/flash_recovery_area/SBDB2/archivelog/2011_03_25/o1_mf_1_188_6rrzsqvm_.arc
2011-03-25 19:40:45.459000 +08:00
Media Recovery Log /standby/flash_recovery_area/SBDB2/archivelog/2011_03_25/o1_mf_1_189_6rrzsv10_.arc
Incomplete Recovery applied until change 11881985 time 03/25/2011 19:40:41
Media Recovery Complete (SBDB2)
tkcrrxms: Killing 140733193388036 processes (all RFS)
2011-03-25 19:40:48.011000 +08:00
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
RESETLOGS after incomplete recovery UNTIL CHANGE 11881985
Resetting resetlogs activation ID 157033137 (0x95c22b1)
Online log /standby/oradata/SBDB2/onlinelog/o1_mf_1_6qf99xt6_.log: Thread 1 Group 1 was previously cleared
Online log /standby/flash_recovery_area/SBDB2/onlinelog/o1_mf_1_6qf99y1p_.log: Thread 1 Group 1 was previously cleared
Online log /standby/oradata/SBDB2/onlinelog/o1_mf_2_6qf9b01v_.log: Thread 1 Group 2 was previously cleared
Online log /standby/flash_recovery_area/SBDB2/onlinelog/o1_mf_2_6qf9b06y_.log: Thread 1 Group 2 was previously cleared
Online log /standby/oradata/SBDB2/onlinelog/o1_mf_3_6qf9b21o_.log: Thread 1 Group 3 was previously cleared
Online log /standby/flash_recovery_area/SBDB2/onlinelog/o1_mf_3_6qf9b268_.log: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 11881983
Setting recovery target incarnation to 4
Converting standby mount to primary mount.
ACTIVATE STANDBY: Complete - Database mounted as primary (SBDB2)
Completed: ALTER DATABASE RECOVER TO LOGICAL STANDBY KEEP IDENTITY

我们还需要完成一系列逻辑备库的配置:

a.禁止在逻辑备库端删除外籍日志 

SQL> EXECUTE DBMS_LOGSTDBY.APPLY_SET('LOG_AUTO_DELETE', 'FALSE');
BEGIN DBMS_LOGSTDBY.APPLY_SET('LOG_AUTO_DELETE', 'FALSE'); END;

ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_LOGSTDBY.APPLY_SET' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

/* 缺少DBMS_LOGSTDBY包,可以从以下脚本创建 */

SQL> @?/rdbms/admin/dbmslsby.sql

SQL> EXECUTE DBMS_LOGSTDBY.APPLY_SET('LOG_AUTO_DELETE', 'FALSE');
PL/SQL procedure successfully completed.

b.执行以下语句可以将逻辑备库不支持的而又在主库上运行过的事务记录到DBA_LOGSTDBY_EVENTS中

SQL> EXECUTE DBMS_LOGSTDBY.APPLY_SET('MAX_EVENTS_RECORDED',DBMS_LOGSTDBY.MAX_EVENTS);
PL/SQL procedure successfully completed.

c.启动在逻辑备库上的SQL APPLY:

SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
Database altered.

3.在步骤3中我们实际实施对逻辑备库的升级操作

1).在逻辑备库上停止SQL APPLY 

SQL> ALTER DATABASE STOP LOGICAL STANDBY APPLY;

2).升级逻辑备库(upgrade logical standby)

SQL> shutdown immediate;

[maclean@rh6 admin]$ cp $ORACLE_HOME/dbs/spfileSBDB2.ora /s01/product/11.2.0/dbhome_2/dbs
[maclean@rh6 dbs]$ orapwd file=orapwSBDB2 password=XXXXX entries=10 

/* 将原SBDB2实例的spfile复制到新的11.2.0.2版本的ORACLE_HOME下,并在新的ORACLE_HOME/dbs目录下创建
   与主库一致的密码文件 */

[maclean@rh6 ~]$ source 11gr2env.sh

[maclean@rh6 ~]$ export ORACLE_SID=SBDB2

SQL> startup upgrade;

SQL> @?/rdbms/admin/catupgrd
..................
/* 确认升级数据字典成功,值得一提的是11g的数据字典升级工作极度复杂,因此该阶段耗费大量时间 */

3)在逻辑备库上启用SQL APPLY

SQL>  ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YY HH24:MI:SS';
Session altered.

SQL> SELECT SYSDATE, APPLIED_TIME FROM V$LOGSTDBY_PROGRESS;
SYSDATE 	   APPLIED_TIME
------------------ ------------------
25-MAR-11 21:26:26 25-MAR-11 21:26:21

/* 监控以上SQL APPLY进度,是否与当前时间接近 */

4)审查升级过程中的监控事件,可以直接查询DBA_LOGSTDBY_EVENTS视图,
该视图记录了在逻辑备库上没有applied的DDL与DML操作

SQL> SET LONG 1000
SQL> SET PAGESIZE 180
SQL> SET LINESIZE 79
SQL> SELECT EVENT_TIMESTAMP, EVENT, STATUS FROM DBA_LOGSTDBY_EVENTS ORDER BY EVENT_TIMESTAMP;

EVENT_TIMESTAMP
---------------------------------------------------------------------------
EVENT
-------------------------------------------------------------------------------
STATUS
-------------------------------------------------------------------------------
25-MAR-11 07.53.39.593831 PM

ORA-16111: log mining and apply setting up

25-MAR-11 07.53.39.594478 PM

Apply LWM 11881984, HWM 11881984, SCN 11881984

25-MAR-11 07.57.54.467786 PM

Shutdown acknowledged

25-MAR-11 07.57.57.516431 PM

ORA-16128: User initiated stop apply successfully completed

25-MAR-11 09.06.42.150573 PM

ORA-16111: log mining and apply setting up

25-MAR-11 09.06.42.345581 PM

Apply LWM 11883610, HWM 11883615, SCN 11883614

25-MAR-11 09.12.56.429392 PM

ORA-16128: User initiated stop apply successfully completed

25-MAR-11 09.26.16.124581 PM

ORA-16111: log mining and apply setting up

25-MAR-11 09.26.16.145925 PM

Apply LWM 11883610, HWM 11883615, SCN 11883614

25-MAR-11 09.26.20.723913 PM
ALTER DATABASE OPEN
ORA-16226: DDL skipped due to lack of support

ORA-16226错误表示相关DDL操作在逻辑备库上不受支持,常见的原因是该类操作涉及内部对象
ORA-16129错误表示相关DML操作在逻辑备库上不受支持

5)开始主备库之间的切换(switchover)

ALTER DATABASE COMMIT TO SWITCHOVER TO LOGICAL STANDBY;

SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;

SWITCHOVER_STATUS
--------------------
TO PRIMARY

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

Database altered.

以上切换完成后当前主库(Primary Database)的日志将无法再传输到逻辑备库(PROD)中,当前主库SBDB的版本为11.2.0.2,
而逻辑备库RPOD(原主库)仍为11.1.0.7;因为当前主库的日志都无法传送到低版本的logical standby中,这意味着当前主库是不受
任何保护的状态,这是滚动升级中无法避免的一个真空期。
现在我们可以在原逻辑备库上启动用户应用程序了。

4.将原主库(PROD)闪回到之前创建的还原点上
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;

SQL> select name from v$restore_point;

NAME
--------------------------------------------------------------------------------
PRE_ROLLING_UPGRD

SQL> flashback database to restore point pre_rolling_upgrd;

5.以新的ORACLE_HOME binary加载原主库(PROD)实例
SQL> shutdown immediate;
/* 先将原主库(PROD)实例关闭  */

[maclean@rh6 dbs]$ cp /standby/product/11.1.0/db_3/dbs/spfilePROD.ora $ORACLE_HOME/dbs
[maclean@rh6 dbs]$ orapwd file=orapwPROD password=XXXXX entries=10 

/* 在新版本的ORACLE_HOME下复制spfile并重建密码文件 */

[maclean@rh6 dbs]$ export ORACLE_SID=PROD

SQL> startup mount;

/* 这里我们不需要运行升级脚本,后面的redo apply会自动升级PROD上的数据字典 */

6.将原主库(PROD)切换会物理备库(physical standby)
SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
SQL> SHUTDOWN IMMEDIATE;

7.启动原主库(PROD)当前物理备库上的介质恢复进程
SQL> STARTUP MOUNT;
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE DISCONNECT FROM SESSION;

8.在主库与物理备库间实施切换

SBDB2 switchover回物理备库状态
SQL>  alter database commit to switchover to physical standby;

PROD切换为Primary Database
SQL> alter database commit to switchover to primary;

因为数据字典升级期间产生了大量的重做日志,切换之前需要完成对这些redo的应用,
所以该阶段将耗费大量时间,可以从PROD的告警日志中看到类似下面的内容:

Media Recovery Log /standby/arch/1_51_746739648.dbf
2011-03-26 00:32:09.625000 +08:00
Media Recovery Log /standby/arch/1_52_746739648.dbf
2011-03-26 00:32:16.931000 +08:00
Media Recovery Log /standby/arch/1_53_746739648.dbf
2011-03-26 00:32:23.149000 +08:00
Media Recovery Log /standby/arch/1_54_746739648.dbf
2011-03-26 00:32:31.732000 +08:00
Media Recovery Log /standby/arch/1_55_746739648.dbf
2011-03-26 00:32:43.807000 +08:00
Media Recovery Log /standby/arch/1_56_746739648.dbf

9.出于节约磁盘空间的考虑将PROD上最初建立的还原点删除:
SQL> DROP RESTORE POINT PRE_ROLLING_UPGRD

Script to Collect Data Guard Diagnostic Information

Overview
——–

This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-

This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-

- - - - - - - - - - - - - - - - Script begins here - - - - - - - - - - - - - - - -

-- NAME: dg_prim_diag.sql  (Run on PRIMARY with a LOGICAL or PHYSICAL STANDBY)


set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool dg_prim_diag_&&dbname&&timestamp&&suffix
set linesize 79
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
set feedback on
select to_char(sysdate) time from dual;

set echo on

-- In the following the database_role should be primary as that is what
-- this script is intended to be run on.  If protection_level is different
-- than protection_mode then for some reason the mode listed in
-- protection_mode experienced a need to downgrade.  Once the error
-- condition has been corrected the protection_level should match the
-- protection_mode after the next log switch.

column role format a7 tru
column name format a10 wrap

select name,database_role role,log_mode,
protection_mode,protection_level
from v$database;

-- ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the
-- archiver failed to archive a log last time, but will try again within 5
-- minutes. LOG_SWITCH_WAIT The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log
-- switching is waiting for.  Note that if ALTER SYSTEM SWITCH LOGFILE is
-- hung, but there is room in the current online redo log, then value is
-- NULL

column host_name format a20 tru
column version format a9 tru

select instance_name,host_name,version,archiver,log_switch_wait
from v$instance;

-- The following query give us information about catpatch.
-- This way we can tell if the procedure doesn't match the image.

select version, modified, status from dba_registry
where comp_id = 'CATPROC';

-- Force logging is not mandatory but is recommended.  Supplemental
-- logging must be enabled if the standby associated with this primary is
-- a logical standby. During normal operations it is acceptable for
-- SWITCHOVER_STATUS to be SESSIONS ACTIVE or TO STANDBY.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru

select force_logging,remote_archive,
supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker
from v$database;

-- This query produces a list of all archive destinations.  It shows if
-- they are enabled, what process is servicing that destination, if the
-- destination is local or remote, and if remote what the current mount ID
-- is.

column destination format a35 wrap
column process format a7
column archiver format a8
column ID format 99
column mid format 99

select dest_id "ID",destination,status,target,
schedule,process,mountid  mid
from v$archive_dest order by dest_id;

-- This select will give further detail on the destinations as to what
-- options have been set.  Register indicates whether or not the archived
-- redo log is registered in the remote destination control file.

set numwidth 8
column ID format 99

select dest_id "ID",archiver,transmit_mode,affirm,async_blocks async,
net_timeout net_time,delay_mins delay,reopen_secs reopen,
register,binding
from v$archive_dest order by dest_id;

-- The following select will show any errors that occured the last time
-- an attempt to archive to the destination was attempted.  If ERROR is
-- blank and status is VALID then the archive completed correctly.

column error format a55 wrap

select dest_id,status,error from v$archive_dest;

-- The query below will determine if any error conditions have been
-- reached by querying the v$dataguard_status view (view only available in
-- 9.2.0 and above):

column message format a80

select message, timestamp
from v$dataguard_status
where severity in ('Error','Fatal')
order by timestamp;

-- The following query will determine the current sequence number
-- and the last sequence archived.  If you are remotely archiving
-- using the LGWR process then the archived sequence should be one
-- higher than the current sequence.  If remotely archiving using the
-- ARCH process then the archived sequence should be equal to the
-- current sequence.  The applied sequence information is updated at
-- log switch time.

select ads.dest_id,max(sequence#) "Current Sequence",
max(log_sequence) "Last Archived"
from v$archived_log al, v$archive_dest ad, v$archive_dest_status ads
where ad.dest_id=al.dest_id
and al.dest_id=ads.dest_id
group by ads.dest_id;

-- The following select will attempt to gather as much information as
-- possible from the standby.  SRLs are not supported with Logical Standby
-- until Version 10.1.

set numwidth 8
column ID format 99
column "SRLs" format 99
column Active format 99

select dest_id id,database_mode db_mode,recovery_mode,
protection_mode,standby_logfile_count "SRLs",
standby_logfile_active ACTIVE,
archived_seq#
from v$archive_dest_status;

-- Query v$managed_standby to see the status of processes involved in
-- the shipping redo on this system.  Does not include processes needed to
-- apply redo.

select process,status,client_process,sequence#
from v$managed_standby;

-- The following query is run on the primary to see if SRL's have been
-- created in preparation for switchover.

select group#,sequence#,bytes from v$standby_log;

-- The above SRL's should match in number and in size with the ORL's
-- returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

-- Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a48 wra
select name, value
from v$parameter
where isdefault = 'FALSE';

spool off

- - - - - - - - - - - - - - - -  Script ends here  - - - - - - - - - - - - - - - -

Overview
——–

This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-

This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-

- - - - - - - - - - - - - - - - Script begins here - - - - - - - - - - - - - - - -

-- NAME: DG_phy_stby_diag.sql


set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool dgdiag_phystby_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
set feedback on
select to_char(sysdate) time from dual;

set echo on

--
-- ARCHIVER can be  (STOPPED | STARTED | FAILED) FAILED means that the archiver failed
-- to archive a -- log last time, but will try again within 5 minutes. LOG_SWITCH_WAIT
-- The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for. Note that
-- if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online
-- redo log, then value is NULL

column host_name format a20 tru
column version format a9 tru
select instance_name,host_name,version,archiver,log_switch_wait from v$instance;

-- The following select will give us the generic information about how this standby is
-- setup.  The database_role should be standby as that is what this script is intended
-- to be ran on.  If protection_level is different than protection_mode then for some
-- reason the mode listed in protection_mode experienced a need to downgrade.  Once the
-- error condition has been corrected the protection_level should match the protection_mode
-- after the next log switch.

column ROLE format a7 tru
select name,database_role,log_mode,controlfile_type,protection_mode,protection_level
from v$database;

-- Force logging is not mandatory but is recommended.  Supplemental logging should be enabled
-- on the standby if a logical standby is in the configuration. During normal
-- operations it is acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or NOT ALLOWED.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru
select force_logging,remote_archive,supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker from v$database;

-- This query produces a list of all archive destinations and shows if they are enabled,
-- what process is servicing that destination, if the destination is local or remote,
-- and if remote what the current mount ID is. For a physical standby we should have at
-- least one remote destination that points the primary set but it should be deferred.

COLUMN destination FORMAT A35 WRAP
column process format a7
column archiver format a8
column ID format 99

select dest_id "ID",destination,status,target,
archiver,schedule,process,mountid
from v$archive_dest;

-- If the protection mode of the standby is set to anything higher than max performance
-- then we need to make sure the remote destination that points to the primary is set
-- with the correct options else we will have issues during switchover.

select dest_id,process,transmit_mode,async_blocks,
net_timeout,delay_mins,reopen_secs,register,binding
from v$archive_dest;

-- The following select will show any errors that occured the last time an attempt to
-- archive to the destination was attempted.  If ERROR is blank and status is VALID then
-- the archive completed correctly.

column error format a55 tru
select dest_id,status,error from v$archive_dest;

-- Determine if any error conditions have been reached by querying thev$dataguard_status
-- view (view only available in 9.2.0 and above):

column message format a80
select message, timestamp
from v$dataguard_status
where severity in ('Error','Fatal')
order by timestamp;

-- The following query is ran to get the status of the SRL's on the standby.  If the
-- primary is archiving with the LGWR process and SRL's are present (in the correct
-- number and size) then we should see a group# active.

select group#,sequence#,bytes,used,archived,status from v$standby_log;

-- The above SRL's should match in number and in size with the ORL's returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

-- Query v$managed_standby to see the status of processes involved in the
-- configuration.

select process,status,client_process,sequence#,block#,active_agents,known_agents
from v$managed_standby;

-- Verify that the last sequence# received and the last sequence# applied to standby
-- database.

select al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied"
from (select thread# thrd, max(sequence#) almax
from v$archived_log
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) al,
(select thread# thrd, max(sequence#) lhmax
from v$log_history
where first_time=(select max(first_time) from v$log_history)
group by thread#) lh
where al.thrd = lh.thrd;

-- The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next
-- gap that is currently blocking redo apply from continuing. After resolving the
-- identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again
-- on the physical standby database to determine the next gap sequence, if there is
-- one.

select * from v$archive_gap;

-- Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a50 wra
select name, value
from v$parameter
where isdefault = 'FALSE';

spool off

- - - - - - - - - - - - - - - -  Script ends here  - - - - - - - - - - - - - - - -

Overview
——–

This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-

This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-

- - - - - - - - - - - - - - - - Script begins here - - - - - - - - - - - - - - - -

-- NAME: dg_lsby_diag.sql  (Run on LOGICAL STANDBY)


set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool dg_lsby_diag_&&dbname&&timestamp&&suffix

set linesize 79
set pagesize 180
set long 1000
set trim on
set trims on
alter session set nls_date_format = 'MM/DD HH24:MI:SS';
set feedback on
select to_char(sysdate) time from dual;

set echo on

-- The following select will give us the generic information about how
-- this standby is setup.  The database_role should be logical standby as
-- that is what this script is intended to be ran on.

column ROLE format a7 tru
column NAME format a8 wrap
select name,database_role,log_mode,protection_mode
from v$database;

-- ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the
-- archiver failed to archive a log last time, but will try again within 5
-- minutes. LOG_SWITCH_WAIT The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log
-- switching is waiting for. Note that if ALTER SYSTEM SWITCH LOGFILE is
-- hung, but there is room in the current online redo log, then value is
-- NULL

column host_name format a20 tru
column version format a9 tru
select instance_name,host_name,version,archiver,log_switch_wait
from v$instance;

-- The following query give us information about catpatch.
-- This way we can tell if the procedure doesn't match the image.

select version, modified, status from dba_registry
where comp_id = 'CATPROC';

-- Force logging and supplemental logging are not mandatory but are
-- recommended if you plan to switchover.  During normal operations it is
-- acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or NOT ALLOWED.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru

select force_logging,remote_archive,supplemental_log_data_pk,
supplemental_log_data_ui,switchover_status,dataguard_broker
from v$database;

-- This query produces a list of all archive destinations.  It shows if
-- they are enabled, what process is servicing that destination, if the
-- destination is local or remote, and if remote what the current mount ID
-- is.

column destination format a35 wrap
column process format a7
column archiver format a8
column ID format 99
column mid format 99

select dest_id "ID",destination,status,target,
schedule,process,mountid  mid
from v$archive_dest order by dest_id;

-- This select will give further detail on the destinations as to what
-- options have been set.  Register indicates whether or not the archived
-- redo log is registered in the remote destination control file.

set numwidth 8
column ID format 99

select dest_id "ID",archiver,transmit_mode,affirm,async_blocks async,
net_timeout net_time,delay_mins delay,reopen_secs reopen,
register,binding
from v$archive_dest order by dest_id;

-- Determine if any error conditions have been reached by querying the
-- v$dataguard_status view (view only available in 9.2.0 and above):

column message format a80

select message, timestamp
from v$dataguard_status
where severity in ('Error','Fatal')
order by timestamp;

-- Query v$managed_standby to see the status of processes involved in
-- the shipping redo on this system.  Does not include processes needed to
-- apply redo.

select process,status,client_process,sequence#
from v$managed_standby;

-- Verify that log apply services on the standby are currently
-- running. If the query against V$LOGSTDBY returns no rows then logical
-- apply is not running.

column status format a50 wrap
column type format a11
set numwidth 15

SELECT TYPE, STATUS, HIGH_SCN
FROM V$LOGSTDBY;

-- The DBA_LOGSTDBY_PROGRESS view describes the progress of SQL apply
-- operations on the logical standby databases.  The APPLIED_SCN indicates
-- that committed transactions at or below that SCN have been applied. The
-- NEWEST_SCN is the maximum SCN to which data could be applied if no more
-- logs were received. This is usually the MAX(NEXT_CHANGE#)-1 from
-- DBA_LOGSTDBY_LOG.  When the value of NEWEST_SCN and APPLIED_SCN are the
-- equal then all available changes have been applied.  If your
-- APPLIED_SCN is below NEWEST_SCN and is increasing then SQL apply is
-- currently processing changes.

set numwidth 15

select
(case
when newest_scn = applied_scn then 'Done'
when newest_scn <= applied_scn + 9 then 'Done?'
when newest_scn > (select max(next_change#) from dba_logstdby_log)
then 'Near done'
when (select count(*) from dba_logstdby_log
where (next_change#, thread#) not in
(select first_change#, thread# from dba_logstdby_log)) > 1
then 'Gap'
when newest_scn > applied_scn then 'Not Done'
else '---' end) "Fin?",
newest_scn, applied_scn, read_scn from dba_logstdby_progress;

select newest_time, applied_time, read_time from dba_logstdby_progress;

-- Determine if apply is lagging behind and by how much.  Missing
-- sequence#'s in a range indicate that a gap exists.

set numwidth 15
column trd format 99

select thread# trd, sequence#,
first_change#, next_change#,
dict_begin beg, dict_end end,
to_char(timestamp, 'hh:mi:ss') timestamp,
(case when l.next_change# < p.read_scn then 'YES'
when l.first_change# < p.applied_scn then 'CURRENT'
else 'NO' end) applied
from dba_logstdby_log l, dba_logstdby_progress p
order by thread#, first_change#;

-- Get a history on logical standby apply activity.

set numwidth 15

select to_char(event_time, 'MM/DD HH24:MI:SS') time,
commit_scn, current_scn, event, status
from dba_logstdby_events
order by event_time, commit_scn, current_scn;

-- Dump logical standby stats

column name format a40
column value format a20

select * from v$logstdby_stats;

-- Dump logical standby parameters

column name format a33 wrap
column value format a33 wrap
column type format 99

select name, value, type from system.logstdby$parameters
order by type, name;

-- Gather log miner session and dictionary information.

set numwidth 15

select * from system.logmnr_session$;
select * from system.logmnr_dictionary$;
select * from system.logmnr_dictstate$;
select * from v$logmnr_session;

-- Query the log miner dictionary for key tables necessary to process
-- changes for logical standby Label security will move AUD$ from SYS to
-- SYSTEM.  A synonym will remain in SYS but Logical Standby does not
-- support this.

set numwidth 5
column name format a9 wrap
column owner format a6 wrap

select o.logmnr_uid, o.obj#, o.objv#, u.name owner, o.name
from system.logmnr_obj$ o, system.logmnr_user$ u
where
o.logmnr_uid = u.logmnr_uid and
o.owner# = u.user# and
o.name in ('JOB$','JOBSEQ','SEQ$','AUD$',
'FGA_LOG$','IND$','COL$','LOGSTDBY$PARAMETER')
order by u.name;

-- Non-default init parameters.

column name format a30 tru
column value format a48 wra
select name, value
from v$parameter
where isdefault = 'FALSE';

spool off

- - - - - - - - - - - - - - - -  Script ends here  - - - - - - - - - - - - - - - -

沪ICP备14014813号-2

沪公网安备 31010802001379号