Know more about RBA redo block address

A “Redo Block Address” (RBA) describes a physical location within a redo log file. An RBA consists of the following:

Log sequence number
Block number within log file
Byte offset within block

The structure of the redo block address is given below :
1 – The number of the redo log.
[4 bytes]

2 – The block offset from the beginning of the redo log file.
[4 bytes]

3 – The byte offset within the block.
[2 bytes]

Log sequence number v$logmnr_contents.RBASQN
Block number within log file v$logmnr_contents.RBABLK
Byte offset within block v$logmnr_contents.RBABYTE
So in this case, RBA [0x19.2.10] maps to Log squence 25, Block number 2 with byte offset 16.
With this information you should be able to find the details from v$logmnr_contents.

In particular, the buffer header information includes two fields: low-RBA and high-SCN. The low-RBA field indicates the point in the redo-log corresponding to the block’s first update after it came into the buffer cache. The high-SCN field indicates the SCN of the last update to the block.

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
 (size = 8180, compat size = 8180, section max = 11, section in-use = 0,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:47
low cache rba:(0x37a.11c.0) on disk rba:(0x37a.151.0)
on disk scn: 0x0000.0045d2f2 12/19/2011 01:03:21
resetlogs scn: 0x0000.00080634 11/07/2011 04:08:50
heartbeat: 770246919 mount id: 2811012380
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0

Script:列出Oracle每小时的redo重做日志产生量

以下脚本可以用于列出最近Oracle数据库每小时估算的redo重做日志产生量,因为估算数据来源于archivelog的产生量和大小,所以数据是近似值,可供参考:

WITH times AS
 (SELECT /*+ MATERIALIZE */
   hour_end_time
    FROM (SELECT (TRUNC(SYSDATE, 'HH') + (2 / 24)) - (ROWNUM / 24) hour_end_time
            FROM DUAL
          CONNECT BY ROWNUM <= (1 * 24) + 3),
         v$database
   WHERE log_mode = 'ARCHIVELOG')
SELECT hour_end_time, NVL(ROUND(SUM(size_mb), 3), 0) size_mb, i.instance_name
  FROM(
SELECT hour_end_time, CASE WHEN(hour_end_time - (1 / 24)) > lag_next_time THEN(next_time + (1 / 24) - hour_end_time) * (size_mb / (next_time - lag_next_time)) ELSE 0 END + CASE WHEN hour_end_time < lead_next_time THEN(hour_end_time - next_time) * (lead_size_mb / (lead_next_time - next_time)) ELSE 0 END + CASE WHEN lag_next_time > (hour_end_time - (1 / 24)) THEN size_mb ELSE 0 END + CASE WHEN next_time IS NULL THEN(1 / 24) * LAST_VALUE(CASE WHEN next_time IS NOT NULL AND lag_next_time IS NULL THEN 0 ELSE(size_mb / (next_time - lag_next_time)) END IGNORE NULLS) OVER(
 ORDER BY hour_end_time DESC, next_time DESC) ELSE 0 END size_mb
  FROM(
SELECT t.hour_end_time, arc.next_time, arc.lag_next_time, LEAD(arc.next_time) OVER(
 ORDER BY arc.next_time ASC) lead_next_time, arc.size_mb, LEAD(arc.size_mb) OVER(
 ORDER BY arc.next_time ASC) lead_size_mb
  FROM times t,(
SELECT next_time, size_mb, LAG(next_time) OVER(
 ORDER BY next_time) lag_next_time
  FROM(
SELECT next_time, SUM(size_mb) size_mb
  FROM(
SELECT DISTINCT a.sequence#, a.next_time, ROUND(a.blocks * a.block_size / 1024 / 1024) size_mb
  FROM v$archived_log a,(
SELECT /*+ no_merge */
CASE WHEN TO_NUMBER(pt.VALUE) = 0 THEN 1 ELSE TO_NUMBER(pt.VALUE) END VALUE
  FROM v$parameter pt
 WHERE pt.name = 'thread') pt
 WHERE a.next_time > SYSDATE - 3 AND a.thread# = pt.VALUE AND ROUND(a.blocks * a.block_size / 1024 / 1024) > 0)
 GROUP BY next_time)) arc
 WHERE t.hour_end_time = (TRUNC(arc.next_time(+), 'HH') + (1 / 24)))
 WHERE hour_end_time > TRUNC(SYSDATE, 'HH') - 1 - (1 / 24)), v$instance i
 WHERE hour_end_time <= TRUNC(SYSDATE, 'HH')
 GROUP BY hour_end_time, i.instance_name
 ORDER BY hour_end_time
 /

Sample Output:

HOUR_END_TIME    SIZE_MB INSTANCE_NAME
------------- ---------- ----------------
2011/9/29 1:0       2.92 VPROD1
2011/9/29 2:0       2.92 VPROD1
2011/9/29 3:0       2.92 VPROD1
2011/9/29 4:0       2.92 VPROD1
2011/9/29 5:0       2.92 VPROD1
2011/9/29 6:0       2.92 VPROD1
2011/9/29 7:0       2.92 VPROD1
2011/9/29 8:0       2.92 VPROD1
2011/9/29 9:0       2.92 VPROD1
2011/9/29 10:       2.92 VPROD1
2011/9/29 11:       2.92 VPROD1
2011/9/29 12:      3.537 VPROD1
2011/9/29 13:       3.55 VPROD1
2011/9/29 14:       3.55 VPROD1
2011/9/29 15:       3.55 VPROD1
2011/9/29 16:       3.55 VPROD1
2011/9/29 17:       3.55 VPROD1
2011/9/29 18:       3.55 VPROD1
2011/9/29 19:       3.55 VPROD1
2011/9/29 20:       3.55 VPROD1

Know more about commit

COMMIT操作是RDBMS中事务结束的标志,在Oracle中与commit紧密相关的是SCN(System Change Number)。

引入SCN的最根本目的在于:

  1. 为读一致性所用
  2. 为redolog中的记录排序,以及恢复

SCN由SCN Base和Scn Wrap组成,是一种6个字节的结构(structure)。其中SCN Base占用4个字节,而SCN wrap占用2个字节。但在实际存储时SCN-like的stucture常会占用8个字节。

 ub4 kscnbas
 ub2 kscnwrp

struct kcvfhcrs, 8 bytes                 @100                              Creation Checkpointed at scn
      ub4 kscnbas                        @100      0x000a8849
      ub2 kscnwrp                        @104      0x0000

在Oracle中一个事务的开始包含以下操作:

  1. 绑定一个可用的rollback segment
  2. 在事务表(transaction table)上分配一个必要的槽位
  3. 从rollback segment中分配undo block

注意system rollback segment是一种特殊的回滚段,在10g以后普通回滚段的类型都变成了”TYPE2 UNDO”,而唯有system rollback segment的类型仍为”ROLLBACK”,这是由其特殊性造就的:

SQL> col segment_name for a20
SQL> col rollback for a20
SQL> select segment_name,segment_type from dba_segments where segment_type='ROLLBACK';

SEGMENT_NAME         SEGMENT_TYPE
-------------------- ------------------
SYSTEM               ROLLBACK

System rollback segment面向的是SYSTEM表空间上数据字典对象相关事务的数据,以及由对用户数据产生的递归SQL调用所产生的数据。

Oracle不使用基于内存锁管理器的行锁,Oracle中的row lock是基于数据块的。数据块中的Interested Transaction List(ITL)是行锁的重要标志。
ITL的分配遵循以下的原则:

  1. 找出未被使用的ITL
  2. 找出最老的已经事务提交的ITL
  3. 做部分的块清理,直到有可用的ITL
  4. 扩展ITL区域,一条ITL占用24字节

当事务提交COMMIT时,需要完成以下步骤的操作:

  1. 得到一个SCN值
  2. 使用得到的SCN更新事务表中的槽位
  3. 在redo log buffer中创建一条commit记录
  4. 将redo log buffer刷新到磁盘上的在线日志文件
  5. 释放表和行上的锁(may cause delayed block cleanout)

Only ARCH Bgprocess may create archivelog?

我们在学习Oracle入门知识时都会介绍到ARCH归档进程,归档进程ARCH负责将在线重做日志归档,注意ARCH只会将日志文件中存在的重做内容复制到归档日志文件中,举例来说重做日志文件的大小是512MB,但当前写入的redo entry只占用10MB空间,此时由某些条件触发了日志切换,那么产生的归档文件的大小仍是10MB。

一直以来存在着这样一种误解:归档操作只会由ARCH进程负责完成。

实际上归档操作可能由多种进程完成,举例来说前台进程Fore ground process就可以完成归档操作,当在前台进程中执行alter system archive log current命令时实际执行归档操作的是前台进程(server process),而非ARCH,这可能是大多数人料想之外的,口说无凭,我们具体验证一下:

[oracle@rh2 ~]$ ps -ef|grep LOCAL=YES|grep G10R2
oracle   20790 20789  0 19:17 ?        00:00:00 oracleG10R2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

SQL> set linesize 200 pagesize 1400;
SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512  "bytes used exclude header",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space",
  5         le.lesiz  *512       "logfile size"
  6    from x$kcccp cp, x$kccle le
  7   where LE.leseq = CP.cpodr_seq
  8     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full bytes used exclude header Left space logfile size
----------------------- ------------ ------------------------- ---------- ------------
                    720   62.0244141                  32518144   19910144     52428800

/* 通过以上脚本可以了解当前日志文件(current logfile)的使用情况,
    这里可以看到日志文件大小为50MB,已使用32MB                    */

[oracle@rh2 ~]$ strace -p 20972 -o archive.log
Process 20972 attached - interrupt to quit

SQL> alter system archive log current;
System altered.

[oracle@rh2 ~]$ cat archive.log |egrep "open|read|write"
read(8, "\1\2\0\0\6\0\0\0\0\0\3^\10!\200\0\0\0\0\0\0\30\352\341\0\0\0\0\0`\0\0"..., 2064) = 258
open("/s01/admin/G10R2/bdump/alert_G10R2.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 6
writev(6, [{"Mon May 30 19:31:23 2011\n", 25}, {"ALTER SYSTEM ARCHIVE LOG", 24}, {"\n", 1}], 3) = 50
open("/flashcard/oradata/G10R2/controlfile/1.ctl", O_RDWR|O_SYNC|O_DIRECT) = 12
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\361\374\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\17\0\0\0\2324\0\0\377\377\1\4?^\0\0\0\6\0\0\0\0\0\0\0\0\0\2"..., 16384, 245760) = 16384
pread(12, "\25\302\0\0\21\0\0\0\2324\0\0\377\377\1\4\377\37\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 278528) = 16384
pread(12, "\25\302\0\0\26\0\0\0\2234\0\0\377\377\1\4\327\312\0\0\0\220\1\0\316\2\0\0\2\0\0\0"..., 16384, 360448) = 16384
pread(12, "\25\302\0\0\23\0\0\0\2174\0\0\377\377\1\4\372\202\0\0\17\0\0\0\265!\315\0\0\0/\234"..., 16384, 311296) = 16384
pwrite(12, "\25\302\0\0\22\0\0\0\2334\0\0\377\377\1\4B\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 294912) = 16384
pwrite(12, "\25\302\0\0\20\0\0\0\2334\0\0\377\377\1\4\240^\0\0\200\6\0\0\0\0\0\0\0\0\0\2"..., 16384, 262144) = 16384
pwrite(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\t\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\t\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\r\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\20\0\0\0\2354\0\0\377\377\1\4\241\177\0\0\200'\0\0\0\0\0\0\0\0\0\2"..., 16384, 262144) = 16384
pread(12, "\25\302\0\0\22\0\0\0\2354\0\0\377\377\1\4X\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 294912) = 16384
pread(12, "\25\302\0\0\26\0\0\0\2354\0\0\377\377\1\4\236W\0\0\0\220\1\0\321\2\0\0\1\0\0\0"..., 16384, 360448) = 16384
pread(12, "\25\302\0\0\36\0\0\0\2354\0\0\377\377\1\4\367\5\0\0\3\0\3\0\2\0\0\0\0\0/f"..., 16384, 491520) = 16384
open("/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log", O_RDONLY|O_DIRECT) = 13
open("/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log", O_RDWR|O_SYNC|O_DIRECT) = 13
open("/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log", O_RDONLY|O_DIRECT) = 14
open("/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log", O_RDWR|O_SYNC|O_DIRECT) = 14
pread(13, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200Um\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 512, 512) = 512
pread(14, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200Um\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 512, 512) = 512
open("/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log", O_RDONLY|O_DIRECT) = 13
open("/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log", O_RDWR|O_SYNC|O_DIRECT) = 13
open("/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log", O_RDONLY|O_DIRECT) = 14
open("/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log", O_RDWR|O_SYNC|O_DIRECT) = 14
pread(13, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200Um\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 512, 512) = 512
pread(14, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200Um\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 512, 512) = 512
pread(12, "\25\302\0\0\24\0\0\0\2354\0\0\377\377\1\4\341\202\0\0\17\0\0\0\265!\315\0\0\0/\234"..., 16384, 327680) = 16384
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\r\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\20\0\0\0\2354\0\0\377\377\1\4\241\177\0\0\200'\0\0\0\0\0\0\0\0\0\2"..., 16384, 262144) = 16384
pread(12, "\25\302\0\0\22\0\0\0\2354\0\0\377\377\1\4X\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 294912) = 16384
pread(12, "\25\302\0\0D\1\0\0\2234\0\0\377\377\1\4L\250\0\0\0\0\0\0\0\0\0\0008\251\332,"..., 16384, 5308416) = 16384
pread(12, "\25\302\0\0A\1\0\0\2234\0\0\377\377\1\4\3715\0\0\0\236<\216\5\0\0\0\0\300[\310"..., 16384, 5259264) = 16384
pwrite(12, "\25\302\0\0C\1\0\0\2364\0\0\377\377\1\4\210\232\0\0\0000\360\1\0\0\0\0\v\252\332,"..., 16384, 5292032) = 16384
pwrite(12, "\25\302\0\0\21\0\0\0\2364\0\0\377\377\1\4[\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 278528) = 16384
pwrite(12, "\25\302\0\0\17\0\0\0\2364\0\0\377\377\1\4?\177\0\0\0'\0\0\0\0\0\0\0\0\0\2"..., 16384, 245760) = 16384
pwrite(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\r\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\r\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDWR|O_CREAT|O_EXCL|O_SYNC, 0660) = 15
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDWR|O_SYNC|O_DIRECT) = 15
write(15, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0~\240\0\0\0\2\0\0\30\370\0\0}|{z"..., 512) = 512
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDONLY|O_DIRECT) = 15
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDWR|O_DIRECT) = 15
pread(13, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200Um\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 1048576, 512) = 1048576
pwrite(15, "\1\"\0\0\1\0\0\0\320\2\0\0\0\200tm\0\0\0\0\0\3 \n\22\235\205\254G10R"..., 1048576, 512) = 1048576
pread(13, "\1\"\0\0\1\10\0\0\320\2\0\0\0\200\271\242:22:48:01\5VALID\1"..., 1048576, 1049088) = 1048576
pwrite(15, "\1\"\0\0\1\10\0\0\320\2\0\0\0\200\271\242:22:48:01\5VALID\1"..., 1048576, 1049088) = 1048576
pread(13, "\1\"\0\0\1\20\0\0\320\2\0\0\0\200\220\271\n\0232011-05-12:22:"..., 1048576, 2097664) = 1048576
pwrite(15, "\1\"\0\0\1\20\0\0\320\2\0\0\0\200\220\271\n\0232011-05-12:22:"..., 1048576, 2097664) = 1048576
pread(13, "\1\"\0\0\1\30\0\0\320\2\0\0\364\201\261/\21\0232011-05-12:22:"..., 1048576, 3146240) = 1048576
pwrite(15, "\1\"\0\0\1\30\0\0\320\2\0\0\364\201\261/\21\0232011-05-12:22:"..., 1048576, 3146240) = 1048576
pread(13, "\1\"\0\0\1 \0\0\320\2\0\0\0\200\31~:22:54:21\5VALID\1"..., 1048576, 4194816) = 1048576
pwrite(15, "\1\"\0\0\1 \0\0\320\2\0\0\0\200\31~:22:54:21\5VALID\1"..., 1048576, 4194816) = 1048576
pread(13, "\1\"\0\0\1(\0\0\320\2\0\0\0\200\320\210\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 5243392) = 1048576
pwrite(15, "\1\"\0\0\1(\0\0\320\2\0\0\0\200\320\210\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 5243392) = 1048576
pread(13, "\1\"\0\0\0010\0\0\320\2\0\0\374\200\265$\5\f\27<\10\7xo\5\f\27<\10\02320"..., 1048576, 6291968) = 1048576
pwrite(15, "\1\"\0\0\0010\0\0\320\2\0\0\374\200\265$\5\f\27<\10\7xo\5\f\27<\10\02320"..., 1048576, 6291968) = 1048576
pread(13, "\1\"\0\0\0018\0\0\320\2\0\0\0\200W&796508\377\5\304\2\6\2\t\377\tP"..., 1048576, 7340544) = 1048576
pwrite(15, "\1\"\0\0\0018\0\0\320\2\0\0\0\200W&796508\377\5\304\2\6\2\t\377\tP"..., 1048576, 7340544) = 1048576
pread(13, "\1\"\0\0\1@\0\0\320\2\0\0\0\200*2SYS\7D807355\377\5\304\2\7"..., 1048576, 8389120) = 1048576
pwrite(15, "\1\"\0\0\1@\0\0\320\2\0\0\0\200*2SYS\7D807355\377\5\304\2\7"..., 1048576, 8389120) = 1048576
pread(13, "\1\"\0\0\1H\0\0\320\2\0\0\0\200~\5\tPROCEDURE\7xo\5\f\30"..., 1048576, 9437696) = 1048576
pwrite(15, "\1\"\0\0\1H\0\0\320\2\0\0\0\200~\5\tPROCEDURE\7xo\5\f\30"..., 1048576, 9437696) = 1048576
pread(13, "\1\"\0\0\1P\0\0\320\2\0\0\0\200ll\5\f\30\10'\0232011-05-12"..., 1048576, 10486272) = 1048576
pwrite(15, "\1\"\0\0\1P\0\0\320\2\0\0\0\200ll\5\f\30\10'\0232011-05-12"..., 1048576, 10486272) = 1048576
pread(13, "\1\"\0\0\1X\0\0\320\2\0\0\0\200\320\370\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 11534848) = 1048576
pwrite(15, "\1\"\0\0\1X\0\0\320\2\0\0\0\200\320\370\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 11534848) = 1048576
pread(13, "\1\"\0\0\1`\0\0\320\2\0\0\0\200\27\216 \377\tPROCEDURE\7xo\5"..., 1048576, 12583424) = 1048576
pwrite(15, "\1\"\0\0\1`\0\0\320\2\0\0\0\200\27\216 \377\tPROCEDURE\7xo\5"..., 1048576, 12583424) = 1048576
pread(13, "\1\"\0\0\1h\0\0\320\2\0\0008\201fj\1N\1N,\0\r\3SYS\7D860"..., 1048576, 13632000) = 1048576
pwrite(15, "\1\"\0\0\1h\0\0\320\2\0\0008\201fj\1N\1N,\0\r\3SYS\7D860"..., 1048576, 13632000) = 1048576
pread(13, "\1\"\0\0\1p\0\0\320\2\0\0\0\200\25%\7xo\5\f\30\20:\0232011-05"..., 1048576, 14680576) = 1048576
pwrite(15, "\1\"\0\0\1p\0\0\320\2\0\0\0\200\25%\7xo\5\f\30\20:\0232011-05"..., 1048576, 14680576) = 1048576
pread(13, "\1\"\0\0\1x\0\0\320\2\0\0\0\200G\36\7xo\5\f\30\23\6\7xo\5\f\30\23\6"..., 1048576, 15729152) = 1048576
pwrite(15, "\1\"\0\0\1x\0\0\320\2\0\0\0\200G\36\7xo\5\f\30\23\6\7xo\5\f\30\23\6"..., 1048576, 15729152) = 1048576
pread(13, "\1\"\0\0\1\200\0\0\320\2\0\0004\200\"\34311-05-12:23:20:1"..., 1048576, 16777728) = 1048576
pwrite(15, "\1\"\0\0\1\200\0\0\320\2\0\0004\200\"\34311-05-12:23:20:1"..., 1048576, 16777728) = 1048576
pread(13, "\1\"\0\0\1\210\0\0\320\2\0\0\0\200\320(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 17826304) = 1048576
pwrite(15, "\1\"\0\0\1\210\0\0\320\2\0\0\0\200\320(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 17826304) = 1048576
pread(13, "\1\"\0\0\1\220\0\0\320\2\0\0\0\200\343\367\377\tPROCEDURE\7xo\5\f"..., 1048576, 18874880) = 1048576
pwrite(15, "\1\"\0\0\1\220\0\0\320\2\0\0\0\200\343\367\377\tPROCEDURE\7xo\5\f"..., 1048576, 18874880) = 1048576
pread(13, "\1\"\0\0\1\230\0\0\320\2\0\0\350\201\200\263\33\32\0232011-05-12:23"..., 1048576, 19923456) = 1048576
pwrite(15, "\1\"\0\0\1\230\0\0\320\2\0\0\350\201\200\263\33\32\0232011-05-12:23"..., 1048576, 19923456) = 1048576
pread(13, "\1\"\0\0\1\240\0\0\320\2\0\0\34\201\36\326-12:23:28:30\5VAL"..., 1048576, 20972032) = 1048576
pwrite(15, "\1\"\0\0\1\240\0\0\320\2\0\0\34\201\36\326-12:23:28:30\5VAL"..., 1048576, 20972032) = 1048576
pread(13, "\1\"\0\0\1\250\0\0\320\2\0\0\0\200P\331VALID\1N\1N\1N,\0\r\3S"..., 1048576, 22020608) = 1048576
pwrite(15, "\1\"\0\0\1\250\0\0\320\2\0\0\0\200P\331VALID\1N\1N\1N,\0\r\3S"..., 1048576, 22020608) = 1048576
pread(13, "\1\"\0\0\1\260\0\0\320\2\0\0\0\200\342X\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 23069184) = 1048576
pwrite(15, "\1\"\0\0\1\260\0\0\320\2\0\0\0\200\342X\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576, 23069184) = 1048576
pread(13, "\1\"\0\0\1\270\0\0\320\2\0\0\0\200k\251,\0\r\3SYS\7D968280\377"..., 1048576, 24117760) = 1048576
pwrite(15, "\1\"\0\0\1\270\0\0\320\2\0\0\0\200k\251,\0\r\3SYS\7D968280\377"..., 1048576, 24117760) = 1048576
pread(13, "\1\"\0\0\1\300\0\0\320\2\0\0\310\200Ix:37:19\5VALID\1N\1N"..., 1048576, 25166336) = 1048576
pwrite(15, "\1\"\0\0\1\300\0\0\320\2\0\0\310\200Ix:37:19\5VALID\1N\1N"..., 1048576, 25166336) = 1048576
pread(13, "\1\"\0\0\1\310\0\0\320\2\0\0\0\200\34*#\7\377\tPROCEDURE\7xo"..., 1048576, 26214912) = 1048576
pwrite(15, "\1\"\0\0\1\310\0\0\320\2\0\0\0\200\34*#\7\377\tPROCEDURE\7xo"..., 1048576, 26214912) = 1048576
pread(13, "\1\"\0\0\1\320\0\0\320\2\0\0\0\200\343\336A\377\tPROCEDURE\7xo\5"..., 1048576, 27263488) = 1048576
pwrite(15, "\1\"\0\0\1\320\0\0\320\2\0\0\0\200\343\336A\377\tPROCEDURE\7xo\5"..., 1048576, 27263488) = 1048576
pread(13, "\1\"\0\0\1\330\0\0\320\2\0\0P\200\367{/\377\tPROCEDURE\7xo\5"..., 1048576, 28312064) = 1048576
pwrite(15, "\1\"\0\0\1\330\0\0\320\2\0\0P\200\367{/\377\tPROCEDURE\7xo\5"..., 1048576, 28312064) = 1048576
pread(13, "\1\"\0\0\1\340\0\0\320\2\0\0\0\200\315_-05-12:23:46:08\5"..., 1048576, 29360640) = 1048576
pwrite(15, "\1\"\0\0\1\340\0\0\320\2\0\0\0\200\315_-05-12:23:46:08\5"..., 1048576, 29360640) = 1048576
pread(13, "\1\"\0\0\1\350\0\0\320\2\0\0\0\200\263Xo\5\f\0302\31\7xo\5\f\0302\31\0232"..., 1048576, 30409216) = 1048576
pwrite(15, "\1\"\0\0\1\350\0\0\320\2\0\0\0\200\263Xo\5\f\0302\31\7xo\5\f\0302\31\0232"..., 1048576, 30409216) = 1048576
pread(13, "\1\"\0\0\1\360\0\0\320\2\0\0\0\200\213K\5VALID\1N\1N\1N,\0\r\3"..., 1048576, 31457792) = 1048576
pwrite(15, "\1\"\0\0\1\360\0\0\320\2\0\0\0\200\213K\5VALID\1N\1N\1N,\0\r\3"..., 1048576, 31457792) = 1048576
pread(13, "\1\"\0\0\1\370\0\0\320\2\0\0\300\200\333N\24\0\t\0T\0\210\22\"\0\0\0\t\0\t\0"..., 12288, 32506368) = 12288
pwrite(15, "\1\"\0\0\1\370\0\0\320\2\0\0\300\200\333N\24\0\t\0T\0\210\22\"\0\0\0\t\0\t\0"..., 12288, 32506368) = 12288
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDWR|O_DIRECT) = 16
write(16, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0~\240\0\0\0\2\0\0\30\370\0\0}|{z"..., 512) = 512
open("/s01/flash_recovery_area/G10R2/archivelog/2011_05_30/o1_mf_1_720_6y700ch9_.arc", O_RDWR|O_DIRECT) = 16
write(16, "\0\"\0\0\0\0\300\377\0\0\0\0\0\0\0\0~\240\0\0\0\2\0\0\30\370\0\0}|{z"..., 512) = 512
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\r\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\17\0\0\0\2364\0\0\377\377\1\4?\177\0\0\0'\0\0\0\0\0\0\0\0\0\2"..., 16384, 245760) = 16384
pread(12, "\25\302\0\0\21\0\0\0\2364\0\0\377\377\1\4[\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 278528) = 16384
pread(12, "\25\302\0\0\26\0\0\0\2354\0\0\377\377\1\4\236W\0\0\0\220\1\0\321\2\0\0\1\0\0\0"..., 16384, 360448) = 16384
pread(12, "\25\302\0\0\313\0\0\0\2234\0\0\377\377\1\4\2\245\0\0005\251\332,D\"\315\0\0\0\0\0"..., 16384, 3325952) = 16384
pread(12, "\25\302\0\0(\1\0\0d4\0\0\377\377\1\4\6*\0\0\33\0\4\0\333\7\0\0\0\0\0\0"..., 16384, 4849664) = 16384
pread(12, "\25\302\0\0A\1\0\0\2234\0\0\377\377\1\4\3715\0\0\0\236<\216\5\0\0\0\0\300[\310"..., 16384, 5259264) = 16384
pread(12, "\25\302\0\0\313\0\0\0\2234\0\0\377\377\1\4\2\245\0\0005\251\332,D\"\315\0\0\0\0\0"..., 16384, 3325952) = 16384
pwrite(12, "\25\302\0\0B\1\0\0\2374\0\0\377\377\1\4\3725\0\0\0\236<\216\5\0\0\0\0\300[\310"..., 16384, 5275648) = 16384
pread(12, "\25\302\0\0C\1\0\0\2364\0\0\377\377\1\4\210\232\0\0\0000\360\1\0\0\0\0\v\252\332,"..., 16384, 5292032) = 16384
pread(12, "\25\302\0\0B\1\0\0\2374\0\0\377\377\1\4\3725\0\0\0\236<\216\5\0\0\0\0\300[\310"..., 16384, 5275648) = 16384
pwrite(12, "\25\302\0\0D\1\0\0\2374\0\0\377\377\1\4\177\253\0\0\0\0\0\0\0\0\0\0\v\252\332,"..., 16384, 5308416) = 16384
pread(12, "\25\302\0\0\313\0\0\0\2234\0\0\377\377\1\4\2\245\0\0005\251\332,D\"\315\0\0\0\0\0"..., 16384, 3325952) = 16384
pwrite(12, "\25\302\0\0B\1\0\0\2374\0\0\377\377\1\4\353{\0\0\0\316,\220\5\0\0\0\0\300[\310"..., 16384, 5275648) = 16384
pread(12, "\25\302\0\0\26\0\0\0\2354\0\0\377\377\1\4\236W\0\0\0\220\1\0\321\2\0\0\1\0\0\0"..., 16384, 360448) = 16384
pwrite(12, "\25\302\0\0\25\0\0\0\2374\0\0\377\377\1\4\236W\0\0\0\220\1\0\321\2\0\0\1\0\0\0"..., 16384, 344064) = 16384
pwrite(12, "\25\302\0\0\314\0\0\0\2374\0\0\377\377\1\4\231@\0\0005\251\332,D\"\315\0\0\0\0\0"..., 16384, 3342336) = 16384
pwrite(12, "\25\302\0\0\22\0\0\0\2374\0\0\377\377\1\4\323\35\0\0\0\0\0\0\0\0\0\0Q\350\255,"..., 16384, 294912) = 16384
pwrite(12, "\25\302\0\0\20\0\0\0\2374\0\0\377\377\1\4\261\375\0\0\200%\0\0\0\0\0\0\0\0\0\2"..., 16384, 262144) = 16384
pwrite(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\0043\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
pread(12, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\0043\377\0\0\0\0\0\0\0\3 \n\22\235\205\254"..., 16384, 16384) = 16384
write(11, "\0\313\0\0\6\0\0\0\0\0\10\6\0\315\"\315\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0"..., 203) = 203

在以上system call trace中可以看到前台进程首先读取并更新了控制文件(controlfile/1.ctl),之后读取了在线日志文件的头部(read o1_mf_3_6v34jpmp_.log=512bytes),接着以相关属性创建了归档日志文件(o1_mf_1_720_6y700ch9_.arc,注意这里使用了O_CREAT、O_EXCL、O_SYNC、0660属性)并得到该打开文件描述符15,接着以每次buffer为1048576字节来读取在线日志并写入归档日志pread(onlinelog)->pwrite(archivelog),直到在线日志中的redo记录被写完。这里的buffer大小1048576实际上受到隐藏参数_log_archive_buffer_size(Size of each archival buffer in log file blocks)的影响,该参数指定了归档缓冲的大小,其单位为日志文件块(log file block),日志文件块大小在不同平台上各有差异,具体可以参考这里。目前主流的UNIX平台上日志文件标准块大小为512 bytes,而_log_archive_buffer_size的默认值一般为2048,所以得出了这里的(2048*512)=1048576=1MB buffer大小。需要注意的是与”alter system archive log current”命令不同,”switch logfile”不会引起前台进程去执行归档操作,归档操作仍将由ARCH进程完成。

我们可以很方便的通过查询v$archived_log视图来了解具体完成某个日志文件归档操作的是何种进程,这得益于其creator列:


SQL> select distinct creator  from v$archived_log;

CREATOR
-------
RMAN
FGRD
ARCH


CREATOR 	VARCHAR2(7) 	Creator of the archivelog:

    ARCH - Archiver process
    FGRD - Foreground process
    RMAN - Recovery Manager
    SRMN - RMAN at standby
    LGWR - Logwriter process

以上可以看到可能创建归档日志的进程不仅仅局限于FGRD与ARCH,RMAN(注意实际上并非直接由rman进程完成)在执行热备操作时也可能掺合进来归档一把,此外在DataGuard环境中LGWR也会参与归档操作。

V$archived_log视图中另外一个有趣的字段是REGISTRAR,一般来说在非DataGuard环境中CREATOR总是等于REGISTRAR,而在DataGuard环境中REGISTRAR常是RFS(remote file server process (RFS) on the standby system where the RFS process writes the redo data to either archived redo log files or standby redo log files)后台进程,而实际的CREATOR则往往仍是ARCH或LGWR。

REGISTRAR 	VARCHAR2(7) 	Registrar of the entry:

    RFS - Remote File Server process
    ARCH - Archiver process
    FGRD - Foreground process
    RMAN - Recovery manager
    SRMN - RMAN at standby
    LGWR - Logwriter process

Does LGWR use synchronous IO even AIO enabled?

在Oracle中开启AIO异步IO后可以一定程度上提升数据库IO性能,但同时也引入了丢失commit数据的风险。具体可以参见小荷同学的文章,但是这里存在一个疑问就是redo重做记录是同步写出还是在启用AIO后同样使用异步IO的API,我们可以通过跟踪lgwr后台进程的system call系统调用来说明该问题:


SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production


SQL> show parameter disk_asynch_io

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE

SQL> show parameter filesystem

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL

[oracle@rh2 ~]$ ps -ef|grep dbw0_G10R2|grep -v grep
oracle   29168     1  0 19:02 ?        00:00:01 ora_dbw0_G10R2

[oracle@rh2 ~]$ strace -p 29168

.............................
io_submit(140140183375872, 34, {{0x7f74ee284e10, 0, 1, 0, 16},
{0x7f74ee290920, 0, 1, 0, 16}, {0x7f74ee286970, 0, 1, 0, 16}, {0x7f74ee290db0, 0, 1, 0, 16}  = 34

io_getevents(140140183375872, 1, 1024, {{0x7f74ee284e10, 0x7f74ee284e10, 8192, 0},
0x7f74ee289710, 8192, 0}}, {600, 0}) = 12

times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
io_getevents(140140183375872, 1, 1024, {{0x7f74ee28c4b0, 0x7f74ee28c4b0, 8192, 0},
{0x7f74ee287290, 0x7f74ee287290, 8192, 0}, {0x7f74ee283988, 0x7f74ee283988, 8192, 0},
{0x7f74ee27fbf0, 0x7f74ee27fbf0, 8192, 0}, {0x7f74ee28a030, 0x7f74ee28a030, 8192, 0}, {0x7f74ee28fdb8, = 22

/* 此处dbwr使用io_submit函数将I/O请求入列,
    io_submit是典型的asynchronous I/O system calls,
    可见Oracle针对数据文件写出已在使用异步IO */

[oracle@rh2 ~]$ ps -ef|grep lgwr_G10R2|grep -v grep
oracle   29170     1  0 19:02 ?        00:00:01 ora_lgwr_G10R2

[oracle@rh2 ~]$ strace -p 29170
.............................
io_submit(139932588023808, 2, {{0x7f4497f423c8, 0, 1, 0, 20}, {0x7f4497f42590, 0, 1, 0, 21}}) = 2

io_getevents(139932588023808, 1, 1024, {{0x7f4497f423c8, 0x7f4497f423c8, 3584, 0}}, {600, 0}) = 1
times(NULL)                             = 480533371
io_getevents(139932588023808, 1, 1023, {{0x7f4497f42590, 0x7f4497f42590, 3584, 0}}, {600, 0}) = 1

以上io_submit system call说明当数据库启用AIO后lgwr同样使用异步IO写出重做记录到online logfile,换而言之当存储crash时的确可能出现redo记录丢失而造成的记录丢失情况。

实际控制lgwr是否使用异步IO的是一个隐藏参数_lgwr_async_io,该参数一般默认为false:

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm='_lgwr_async_io';

NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
DESCRIB
--------------------------------------------------------------------------------
_lgwr_async_io
FALSE
LGWR Asynchronous IO enabling boolean flag

你可能要这样问:”既然该_lgwr_async_io参数默认为false,那么lgwr就应当使用同步IO而非async,这岂不矛盾?”

是的,理论上来说默认_lgwr_async_io参数为false,lgwr应当使用sync。但是在版本10.1.0.2到11.1.0.6存在一个“Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE”:

Abstract: LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE

PROBLEM:
——–
+ Lgwr uses asynch IO irrespective of the setting _lgwr_async_io=false
and consumes high CPU performing IO poll operations.
+ short stack and truss output show that the lgwr is waiting for asynch IO
completion notification.

This particular problem was dormant for a long time and highly affects the
database performance.

DIAGNOSTIC ANALYSIS:
——————–
LGWR Shortstack
~~~~~~~~~~~~~~~

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

Truss output
~~~~~~~~~~~~~
= 1048576
67072
27640/38: kaio(AIONOTIFY, 0) = 0
27640/40: kaio(AIONOTIFY, 0) = 0
27640/43: kaio(AIONOTIFY, 0) = 0
27640/45: kaio(AIONOTIFY, 0) = 0
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
= 1048576
27640/42: kaio(AIONOTIFY, 0) = 0
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1

Note : The TRUSS output was taken with the filesystemio_options=setall.

1. Performed the test on solaris Box with the below settings :

filesystemio_options=none
disk_asynch_io=false
_lgwr_async_io=TRUE

2. Performed another test on Linux with the following settings and it looks
like the behaviour matches with Solaris and hence it looks like it may not be
port specific .

filesystemio_options=SETALL
disk_asynch_io=TRUE
_lgwr_async_io=TRUE/FALSE

Linux_lgwr_true.out ) .

Request you to review the uploaded files and update with the results of your
analysis .

As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings .

Solution
Most often high CPU consumption by lgwr is related to Async IO.

You can try setting filesystemio_options=none & disk_asynch_io=false with obvious performance impact , but this may not be acceptable to you .

As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings .

1] Move the Redologs to the filesystem where the async IO is not permitted . This will cause the lgwr process not to use async calls.

OR

2] Use direct I/O, which is best done by mounting the corresponding filesystems unbuffered ( (using the”sync” option in Linux ext3, or the “forcedirectio” option in Solaris ufs, for example).

For example :
=========
On solaris:
—————
# mount_ufs -o forcedirectio /dev/rdsk/c2t3d0s0 /d17

If forcedirectio is specified and supported by the file system, then for the duration of the mount forced direct I/O will be used. If the filesystem is mounted using forcedirectio, then data is transferred directly between user address space and the disk. If the filesystem is mounted using noforcedirectio, then data is buffered in kernel address space when data is transferred between user address space and the disk. forcedirectio is a performance option that benefits only from large sequential data transfers. The default behavior is noforcedirectio.

On linux :
————–
# mount -o sync /dev/rdsk/c2t3d0s0 /d17

The sync option has effect only for ext2, ext3, fat, vfat and ufs . The default is async .

PS: We recommend you to involve your system administrator in chainging the moutn options of the OS file system .

由于Bug:8357698的存在,导致filesystemio_options和disk_asynch_io 2个参数可以覆盖_lgwr_async_io的设置,这个问题一直到11.1.0.6以后才得到修复,而在11.2中_lgwr_async_io参数干脆被取消了:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm='_lgwr_async_io';

no rows selected

SQL> show parameter filesystem

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      setall
SQL> show parameter async

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE
tape_asynch_io                       boolean     TRUE

io_submit(139755827974144, 2, {{0x7f1b6e992450, 0, 1, 0, 256}, {0x7f1b6e9921f8, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e992450, 0x7f1b6e992450, 1024, 0}, {0x7f1b6e9921f8, 0x7f1b6e9921f8, 1024, 0}}, {600, 0}) = 2
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863691
semtimedop(2654212, 0x7fffadc3db80, 1, {3, 0}) = 0
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
io_submit(139755827974144, 2, {{0x7f1b6e9921f8, 0, 1, 0, 256}, {0x7f1b6e992450, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e9921f8, 0x7f1b6e9921f8, 512, 0}, {0x7f1b6e992450, 0x7f1b6e992450, 512, 0}}, {600, 0}) = 2

/* 在11.2中若启用了AIO,那么lgwr仍会沿用async IO */

总结:
lgwr是否启用async IO取决于_lgwr_async_io隐藏参数,该参数默认为false,也就是说lgwr默认情况下应当使用sync同步IO。但是因为10.1.0.2到11.1.0.6间存在bug会导致_lgwr_async_io被disk_asynch_io和filesystemio_options参数覆盖而不起作用;在以上版本中若启用了AIO那么lgwr无疑会使用AIO,如果既想要启用AIO又不想因为lgwr使用async而造成隐患,那么可以将日志文件所在移动到不允许async IO的位置,或者使用direct/sync/forcedirectio选项来mount文件系统。在版本11.2以后_lgwr_async_io参数被废弃,lgwr是否启用async io完全取决于disk_asynch_io及filesystemio_options。

Latches and Tuning:Redo Log Buffer and Latches

1. The size of the redo log buffer is determined by the LOG_BUFFER parameter.

a.  Determine the size of the LOG_BUFFER by executing the following query:> select * from v$sga;

b.  View the LOG_BUFFER setting in the init.ora file.

2.  There are three latches associated with the redo log buffer.

a.  Execute the following query to view the latches:> select name from v$latchname where name like ‘redo%’;

There will be more discussion about these latches later in the module.

3.  A number of parameters that effect latch performance have been made obsolete in Oracle8i.
a.  Execute the following query to view the obsolete redo buffer parameters:

> select ksppinm from x$ksppi where ksppinm like ‘_log%’;

Of the parameters listed, _LOG_SIMULTANEOUS_COPIES  and _LOG_IO_SIZE are now obsolete.  LOG_SMALL_ENTRY_MAX_SIZE and LOG_ENTRY_PREBUILD_THRESHOLD are no longer available in Oracle8i.

_LOG_SIMULTANEOUS_COPIES effects the redo copy latch, but since the redo copy is always acquired there is no need to set this parameter.
_LOG_IO_SIZE effects the redo writing latch.

b.  Execute the following query to view the redo related parameters that are still available in Oracle8i:

> select name from v$system_parameter where name like ‘log%’;

You should see the parameters log_buffer, log_checkpoint_interval, and log_checkpoint_timeout.

4. There are 10 wait events associated with the redo log buffer and latches.

a.  View the wait events by executing the following query:> select name from v$event_name where name like ‘log%’;

Of the events that are listed, three are of primary importance:

Log buffer space: a process is waiting for space in the log buffer
Log file sync: the time it takes for LGWR to complete the write then post the requester.
Log file parallel write: the main event waited while writing the redo records to the current log file.

b. You can also gather event information from the V$SYSTEM_EVENT view by executing the following query:

> select event,total_waits,total_timeouts,time_waited from v$system_event where
event in (‘log buffer space’,’log file sync’,’log file parallel write’);

In a well tuned system redo-related waits should be none or minimal.  If one of the events is not listed in the output that means no event has occurred.

5.  There are 14 statistics associated with the redo log buffer and latches.

a.  Execute the following query to view the statistics:> select name from v$sysstat where name like ‘redo%’;

The most important statistics are:

Redo Writes: number of times the log buffer is written.
Redo Blocks Written: total number of blocks written.
Redo Log Space Requests: number of times requested to allocate space in the log file.
Redo Size, Redo Entries: how much redo is generated since instance statrup
Redo Buffer Allocation Retries: total number of retries necessary to allocate space in the redo buffer.

6.  Contention for redo log buffers does not generally cause major performance problems on the database. Waits for
the event log buffer space could indicate contention for the redo log buffers.  However, it may be more useful to find
out the proportion of redo entries which had to wait for space to become available.
a.  Execute the following query to determine the ratio:

> select r.value, s.value, r.value/s.value
from v$sysstat r, v$sysstat s
where r.name = ‘redo buffer allocation retries’ and
s.name = ‘redo entries’;;

The ratio should be near zero.

7. Remember from the first practice that there are three latches associated with the redo log buffer; redo allocation,
redo copy, and redo writing.  The tuning goal is to minimize contention and waits for these latches.

a.  To tune the latches, look at the latch statistics and the ratio of MISSES to GETS by executing the following query:> select n.name, gets, misses, immediate_gets, immediate_misses
from v$latch l, v$latchname n
where n.name in (‘redo allocation’,’redo copy’,’redo writing’)
and n.latch#= l.latch#;

If the ratio of MISSES to GETS exceeds 1%, or the ratio of IMMEDIATE_MISSES to (IMMEDIATE_MISSES+IMMEDIATE_GETS) exceeds 1%, there is latch contention.

To return the ratios you can execute the following query.  However, you may receive a division by zero error.

> select n.name, gets, misses, immediate_gets, immediate_misses, (misses/gets),
(immediate_misses/(immediate_misses+immediate_gets))
from v$latch l, v$latchname n
where n.name in (‘redo allocation’,’redo copy’,’redo writing’)
and n.latch#= l.latch#;

8.  You can increase the number of redo copy latches by setting the parameter LOG_SIMULTANEOUS_COPIES.
This parameter is a hidden parameter and must be set in the init.ora and the database must be restarted for it
to take effect.

a.  One method for determing the number of redo copy latches is to query the V$LATCH_CHILDREN view.  There will be one latch for the value of _LOG_SIMULTANEOUS_COPIES.  If there is only one latch, you will have to query V$LATCH_PARENT as there will be no children.> select name from v$latch_children where name = ‘redo copy’;

b.  If you want to increase or decrease the number of redo copy latches, modify the parameter _LOG_SIMULTANEOUS_COPIES in the init.ora file.  Once you have done this, you can bounce the
database and execute the query in step a.

9.  You can increase the number of redo writing latches by modifying the parameter _LOG_IO_SIZE.  This
parameter is a hidden parameter and must be set in the init.ora and the database must be restarted for it to take
effect.
a.  One method for determing the number of redo copy latches is to query the V$LATCH_CHILDREN view.  There will be one latch for the value of _LOG_IO_SIZE.  If there is only one latch, you will have to query V$LATCH_PARENT as there will be no children.

> select name from v$latch_children where name = ‘redo writing’;

b.  If you want to increase or decrease the number of redo copy latches, modify the parameter
_LOG_IO_SIZE in the init.ora file.  Once you have done this, you can bounce the database and execute the
query in step a.

Know more about DML

The change vectors as constructed by the transaction layer are used by the cache layer to apply the physical block changes.

Redo log ordering and recovery as well as read consistency are the fundamental purposes of SCNs.

The change vector:
Change vectors are built in the PGA process of the process modifying a block, have the DBA and SCN/SEQ of the block before the change, and only ever perform the change for one block.

The system commit number:
The SCN is a 6-byte structure consisting of the SCN base and SCN wrap.
Is a 6-byte structure consisting of the SCN base and SCN wrap, 4 bytes for the SCN base and 2 bytes for the SCN wrap

The SCN is used within the data block to reflect the committed version of the block.
There is only one global SCN generator for each database.
A large transaction is defined as a transaction that is participating in a parallel DML (PDML) operation or a distributed transaction.
One of the first checks for rollback segment allocation is whether the transaction is participating in a PDML or distributed transaction. These are used to determine the largest rollback segment to be allocated.

steps below to illustrate the order necessary to begin a transaction.

1.bind to the available rollback segment
2.allocate a slot in the transaction table
3.allocate an undo block to the rollback segment

When a free slot cannot be allocated in the transaction table, exclusive access is acquired in the rollback segment header.
Only when a free slot can be allocated in the transaction table do you gain exclusive access to the RBS header.

The system rollback segment will be used only for data for the system tablespace and data that is being created from recursive SQL calls generated by user data.

The search for the allocation of an undo block to a rollback segment will occur in:
Searching for an undo block to be allocated will begin in the current extent if the extent boundary has not been reached, in the next extent provided there are no uncommitted transactions, or in a newly allocated extent if the next extent has uncommitted data.

Rollback segments are shrunk either explicitly through the ’alter rollback segment … shrink to …’ command or implicitly by the SMON process. Implicit shrinkage can be disabled by:
PCTINCREASE has no affect on rollback segments. All extents will be sized by the setting of the next_extent storage attribute.
Setting the event 10512 will disable the implicit shrink operation performed by SMON.
Setting the initialization parameter ‘rbs_disable_shk’. There is no such parameter. Setting the event 10512 will disable the implicit shrinkage operation performed by SMON.

The Oracle server begins shrinking a rollback segment starting from the current extent +2

Rollback segment shrinkage will occur while optimal or minextents is not reached, there are at least two remaining extents, the extent is not busy, and the extent is not extent #0.

An extent is reused if all its blocks are inactive. RBS undo blocks are reused on a per extent basis. An undo block becomes inactive when the active transaction in the undo block commits.

correct order of Interested Transaction List (ITL) allocation.

1. Find an  Unused ITL
2. FInd ITL of oldest commited TX
3. DO a partial block cleanout that stops when an ITL becomes available
4. Extend ITL area

The variable portion of the transaction layer is dictated by the MAXTRANS parameter.
The INITRANS parameter dictates the number of ITL’s and hence the variable portion of the Transaction Layer.

An ITL must be acquired by any transaction wishing to modify a block. If an ITL is unavailable and there is no space left in the block, a transaction will wait.

A transaction that currently has an ITL within a block cannot update any row in the block if the required row is locked by a previous transaction. That transaction will wait on the TX enqueue.When attempting to update a row already held by a previous transaction, a transaction will wait on the TX enqueue in exclusive mode.

The free space credit field of the transaction layer maintains the space as freed for a given transaction, and that is returned on commit of the transaction.

When a transaction is unable to allocate an ITL within a data block for which it needs to modify a block, the transaction will:
That transaction will wait on the TX enqueue in share mode until an ITL becomes available within the block where the modification is required.

The Oracle server maintains the row header flag and lock byte (total 2 bytes) for the deleted row. It is necessary to keep these first two bytes to maintain the row lock. This is referred to as stub space.

The correct statistic Rollback changes – undo records applied is incremented for each link in the undo record chain.

The Data Block Address (DBA) in the Transaction Table slot indicates:
The beginning of the undo chain for the transaction  Correct! The DBA in the Transaction Table slot is the starting point for rolling back a transaction

the correct sequence of events at the COMMIT of a transaction.

1. FIND an scn value
2. update the transaction table slot with the scn
3. create a commit record in the redo log buffer
4. flush the redo log buffer to disk
5. release locks held on the rows and tables

The application of the undo to the data block is performed by the
The data layer ,and the transaction layer do not have the right to perform physical block-level changes.

EVENT:10228 trace application of redo by kcocbk

Error:  ORA 10228
Text:   trace application of redo by kcocbk
-------------------------------------------------------------------------------

Explanation:
        This is NOT an error but is a special EVENT code.
        It should *NOT* be used unless explicitly requested by RD support.

        Event 10228 is potentially useful in recovery situations since it
        traces the application of redo. Therefore, if cache recovery
        appears to hang or corrupt blocks then this can help track the dba
        and operations upon it.

        This event is available in Oracle 6 and 7.

        The code just calls the same routines used to dump a redo log using
        'alter system dump logfile' so has the same output.
        The benefit is that it'll stop when it hits the corruption.

Articles:
        Interpreting DUMP LOGFILE Output <Note:29726.1>

沪ICP备14014813号-2

沪公网安备 31010802001379号