【Oracle数据恢复】Redo Log重做日志文件坏块Corruption的解决 ORA-16038 ORA-00354 ORA-00353 ORA-00367 ORA-01624

当数据库打开时若某一个redo log重做日志文件存在日志损坏/坏块corruption,则可能出现以下错误:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队

 

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

ORA-16038 log %s sequence# %s cannot be archived
ORA-354 corrupt redo log block header
ORA-353 log corruption near block  change time 
ORA-367 checksum error in log file header

[oracle@mlab2 ~]$ oerr ora 16038
16038, 00000, "log %s sequence# %s cannot be archived"
// *Cause:  An attempt was made to archive the named file, but the
//          file could not be archived. Examine the secondary error
//          messages to determine the cause of the error.
// *Action: No action is required.

[oracle@mlab2 ~]$ oerr ora 354
00354, 00000, "corrupt redo log block header"
// *Cause:  The block header on the redo block indicated by the accompanying
//          error, is not reasonable.
// *Action: Do recovery with a good version of the log or do time based
//          recovery up to the indicated time. If this happens when archiving,
//          archiving of the problem log can be skipped by clearing the log
//          with the UNARCHIVED option. This must be followed by a backup of
//          every datafile to insure recoverability of the database.
[oracle@mlab2 ~]$ oerr ora 353
00353, 00000, "log corruption near block %s change %s time %s"
// *Cause:  Some type of redo log corruption has been discovered. This error
//          describes the location of the corruption. Accompanying errors
//          describe the type of corruption.
// *Action: Do recovery with a good version of the log or do incomplete
//          recovery up to the indicated change or time.
www.askmac.cn
[oracle@mlab2 ~]$ oerr ora 367
00367, 00000, "checksum error in log file header"
// *Cause:  The file header for the redo log contains a checksum that does
//          not match the value calculated from the file header as read from
//          disk. This means the file header is corrupted
// *Action: Find the correct file and try again.

 

 

 

 

对于不同状态的在线日志文件online redo logfile 损坏/坏块,有不同的解决方案。

对于以下2种情况将不能drop online的redo logfile,他们是:

  1. 如果仅仅有2个redo logfile groups
  2. 损坏的redo logfile文件属于当前日志组 current logfile group; V$LOG.STATUS=CURRENT

 

 

解决方案大致如下:

 

可以通过clear 命令来清理有问题的日志文件,语法如下:

alter database clear <unarchived> logfile group <integer>;
alter database clear <unarchived> logfile ‘<filename>’;

 

例如:

alter database clear logfile group 1;
alter database clear unarchived logfile group 1;

 

 

对于status=current 或者 status=active的日志将无法被清理, 如果清理会报 ORA-01624:

 

SQL> alter database clear unarchived logfile group 5;
alter database clear unarchived logfile group 5
*
ERROR at line 1:
ORA-01624: log 5 needed for crash recovery of instance G10R25 (thread 1)
ORA-00312: online log 5 thread 1: '/s01/G10R25/onlinelog/o1_mf_5_954q1vdo_.log'
www.askmac.cn
[oracle@vrh8 ~]$ oerr ora 1624
01624, 00000, "log %s needed for crash recovery of instance %s (thread %s)"
// *Cause:  A log cannot be dropped or cleared until the thread's checkpoint
//          has advanced out of the log.
// *Action: If the database is not open, then open it. Crash recovery will
//          advance the checkpoint. If the database is open force a global
//          checkpoint. If the log is corrupted so that the database cannot
//          be opened, it may be necessary to do incomplete recovery until
//          cancel at this log.

 

 

对于status=active的日志较为简单,只要能顺利完成一个alter system checkpoint就可以将其状态改为INACTIVE之后再CLEAR。

对于status=current的online redo logfile则比较麻烦,一般需要动用隐藏参数”_ALLOW_RESETLOGS_CORRUPTION”

ORA-1624与redo log损坏



如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队

 

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

当数据库处于打开状态下多个redo logfile中的一个损坏了,则可能出现如下的错误信息:

ORA-16038 log %s sequence# %s cannot be archived
ORA-354 corrupt redo log block header
ORA-353 log corruption near block <num> change <str >time <str>
ORA-367 checksum error in log file header
ORA-368 checksum error in redo log block

 

[oracle@mlab2 ~]$ oerr ora 16038
16038, 00000, “log %s sequence# %s cannot be archived”
// *Cause: An attempt was made to archive the named file, but the
// file could not be archived. Examine the secondary error
// messages to determine the cause of the error.
// *Action: No action is required.

 

其他场景下可以通过drop redo logfile来绕过,但是如果对应的redo logfile是被crash/instance recovery 所需要则无法drop掉。

在线的redo logfile 可能无法drop的2个原因是:

  • 仅仅只有2组redo logfile
  • 受损坏的redo log file属于当前日志组

针对上述描述的问题可以考虑通过clear logfile 的方式来解决,如:

 

 

alter database clear <unarchived> logfile group <integer>;
alter database clear <unarchived> logfile '<filename>';
alter database clear logfile group 1;
alter database clear unarchived logfile group 1;

 

对于v$LOG中status=CURRENT或者status=ACTIVE的在线redo logfile一般是clear不掉的,会报一个错误:

 

oerr ora 01624
01624, 00000, “log %s needed for crash recovery of instance %s (thread %s)”
// *Cause: A log cannot be dropped or cleared until the thread’s checkpoint
// has advanced out of the log.
// *Action: If the database is not open, then open it. Crash recovery will
// advance the checkpoint. If the database is open force a global
// checkpoint. If the log is corrupted so that the database cannot
// be opened, it may be necessary to do incomplete recovery until
// cancel at this log.

 

注意用户不应当把’alter database clear logfile’拿来经常使用,由此而引发的缺失归档日志,会导致完全恢复变得不可能。在执行clear logfile后迅速做全库的备份吧!

 

 

Oracle为何会发生归档日志archivelog大小远小于联机重做日志online redo log size的情况?

《rac不到redo大小就切换是什么原因?》

 

这个帖子详细讨论了RAC或单机环境中归档日志没到online redolog size 大小就马上切换问题的成因, 主要是由于oracle内部算法所决定的,  一般不会造成实例 性能或可用性问题, 唯一的困扰可能是 生成了较多的小归档日志。

 

Exadata Smart Flash Logging新特性

从Exadata Storage Software 11.2.2.4开始引入了Exadata Smart Flash Logging的新特性,该特性允许LGWR进程将redo同时并行写入flash cache 和 disk controller 中, 只要在flash cache 和 disk controller中有任意一个率先写完成就会通知RDBMS数据库继续工作, 该特性着眼于改善了Exadata对Redo写入的响应时间和吞吐量。(The feature allows redo writes to be written to both flash cache and disk controller cache, with an acknowledgement sent to the RDBMS as soon as either of these writes complete; this improves response times and thoughput.);特别有利于改善log file sync等待事件是主要性能瓶颈的系统。

当频繁写redo重做日志的IO成为Exadata一体机的主要性能瓶颈的时候,Exadata开发部门自然想到了通过DBM上已有的flashcache来减少响应时间的办法。但是又要保证不会因为flashcache的引入而导致redo关键日志信息的丢失:

 

The main problem that number of writes in redo logs is very high, even there are no activity in database. Therefore Flash Cache disk will reach his write limit very fast – some days or months (I am not see exact test results). In this way you will lost flash cache disk and all data on them. But losing redo logs is very unpleasant case of database unavailability, which can lie in big downtime and possible data loss.
As you already know, 11.2.2.4.0 introduced the Smart Flash Log feature. For customers that are not in 11.2.2.4.0, don’t suggest putting the redo logs on a diskgroup that uses griddisks carved from flashdisks. There are different issues when using redo logs on the flashcache in previous versions and those should be avoided.

 

 

只要是安装过Exadata Storage Software 11.2.2.4补丁的系统都会隐式地启用该Exadata Smart Flash Logging特性,但是它同时也要求数据库版本要大于Database 11.2.0.2 Bundle Patch 11。

Metalink目前没有介绍如何在已经启用Exadata Smart Flash Logging的DBM上禁用(disable)该特性。

 

实际每个cell会分配512MB的flashcache用于Smart Flash Logging,因此现在每个cell的可用flash空间为 364.75Gb 。

 

不仅局限于Online Redo Log可以受益于Smart Flash Logging,Standby Redo Log 也可以从该特性中得到性能提升,前提是满足必要的软件版本组合cell patch 11.2.2.4 and Database 11.2.0.2 Bundle Patch 11 or greate。

可以通过CellCLI 命令行了解现有的Smart Flash Logging配置,若有输出则说明配置了Smart Flash Logging。

 

CellCLI> LIST FLASHLOG DETAIL

 

更多信息可以参考文档”Exadata Smart Flash Logging Explained”,引用如下:

 

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.

The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.

By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.

 

以及<Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine>官方白皮书,公开的文档地址

 

 

[gview file=”http://www.oracle.com/technetwork/database/exadata/exadata-smart-flash-cache-366203.pdf”]

沪ICP备14014813号-2

沪公网安备 31010802001379号