11.2.0.2 “_datafile_write_errors_crash_instance”

可以通过隐藏参数_datafile_write_errors_crash_instance控制DBWR进程写出到数据文件遇到I/O问题时的表现:
在版本11.2.0.2中:
1. 默认情况下,若是RAC或未启用归档的实例或出现问题的是SYSTEM表空间的数据文件会导致实例终止;其他情况下,会造成相关数据文件被OFFLINE。
2.若该参数设置为TRUE,则在DBWR写文件出现问题时总是终止本实例
3. 若设置为FALSE,仅在未启用归档或SYSTEM表空间的数据文件的情况下会终止实例;

 

” The new behavior is defined by _datafile_write_errors_crash_instance
parameter. Parameter has no default value and 3 different behaviors depending
if it is set and to which value. For default behavior (param not set) the
instance is crashed if RAC or archivelog mode is disabled or this is a system
tablespace. Otherwise the file is offlined. If param is set to TRUE, it is
crashed in all cases. If set to FALSE, it is crashed only if archivelog mode
is disabled or this is a system tablespace.”

 

参数并不能完全保证实例不受IO错误影响而意外终止,且设置该参数可能导致数据文件被OFFLINE。 具体设置方法,该参数可以在线设置:

 

SQL> alter system set “_datafile_write_errors_crash_instance”=false;

提高DBWR进程的吞吐量

Improve DBWR Throughput

To improve DBWR’s throughput, consider the following:

  • Disk capabilities and setup (stripe size, speed, controllers etc…)
  • Using raw devices instead of File System files — depending on their efficiency on your system.
  • Spreading database file across drives and controllers
  • Using Async IO if supported
  • If asynch IO is not possible, using multiple database writers. This is done with the DB_WRITERS database parameter in Oracle7, and the DBWR_IO_SLAVES database parameter in Oracle8 and 9.
  • Using multiple DB Writer gatherer processes in Oracle8 using the DB_WRITER_PROCESSES parameter.
  • Using the “Multiple buffer pools” feature in Oracle8 and Higher. See Note:135223.1 on the Oracle Metalink website.

 

Achieving the optimal setup for DBWR is highly dependent on the characteristics of your platform. The maximum I/O size the platform supports and the ability to support Asynchronous I/O are examples of these characteristics.

Who pulls the background process dbwr’s trigger?

到底是谁扣动了database writer的扳机?初学Oracle的朋友都会对dbwr这个后台进程有一种模糊的印象,dbwr何时会被触发?很多人大约会回答当发生检查点或者当某些脏块在LRU链表上处于较冷的一端时。同时又有许多关注于宏观架构的工程师会将dbwr的写出规律归结为是lazy(懒)的。Oracle作为目前市场占有率最高的商用数据库,其各种内部算法都可以算得上是商业机密;虽然不断有专家为我们”解密”,但在我的观念中这些内部原理都与真理之冠有着不大不小的差别。所以显然我要描述的是我个人对于database writer以及cache management(缓存管理)的理解,这些理解在一定程度上是能够自洽的,但我无法保证它们必然准确无误。

要详细描述dbwr的工作原理,我们需要从久远年代的版本V7323说起,当时的db writer和cache management已经十分成熟了,8i以后只是引入了增量检查点等特性,dba不用再关心db writer受一些细节参数的影响,而只需要关注增量检查点的活跃程度就可以了。以下我们列出在V7323中,dbwr可能被触发写出的几种情况:

a.当前台进程需要将磁盘上的物理数据块读取到数据库高速缓存中(db cache)时,其首先需要在数据库缓存中寻找到一块可用的free(空闲) buffer。为了寻找这样free buffer,该前台进程首先需要以排他方式持有相关LRU链表的latch(闩),并在该LRU链表上扫描所可用的Free buffer,扫描都会从LRU链表的尾端开始,也就是”较冷”的一端。在此过程中,前台进程沿着由尾到头的方向所遍历到的脏块将被移动到LRUW链表上(注意:一个buffer同时只可能处于一个链表上);此外相关的统计信息如dirty buffer inspectedfree buffers inspected将会累增。若该前台进程在LRU链表上搜索的范围超过了整个LRU链表长度*(隐式参数_db_block_max_scan_count/100)所规定的阀值时,其搜索操作将自行中止,该前台进程还会以信号通知dbwr进程并释放其所持有的LRU latch。dbwr后台进程在收到前台进程的信号信息后,会执行一次大批量的写出操作以使得LRU链表上有干净的clean buffer可用,在此过程中前台进程将处于free buffer wait等待事件中。dbwr后台进程为了写出LRUW与(LRU链表尾部)的脏块,其会主动去持有LRU latch并扫描该LRU链表(也是从尾部开始)试图找出脏块,并批量写出这些收集到得脏块。该DBWR的扫描深度(DBWR scan depth)由隐式参数_db_writer_scan_depth_pct的所指定,当DBWR所扫描的LRU链表长度等于整个LRU链表长度*(_db_writer_scan_depth_pct/100)时,DBWR将停止继续扫描LRU链表。

8i以后:以上这种情况一言以蔽之就是DBWR write for Free Request,这种情况在8i以后仍然奏效;hidden parameter _db_block_max_scan_pct依然健在,其默认值为40,当然也可以从x$kvit视图中”Max percentage of LRU list foreground can scan for free”相关列观察到。到10.1版本中_db_writer_scan_depth_pct(Percentage of LRU buffers for dbwr to scan when looking for dirty)仍健在其默认值为25,在10.2中被彻底废弃。由于引入了增量检查点,DBWn也会主动去遍历LRU链表,将发现的Dirty Buffer移至Checkpoint Queue(dirty queue)上,该扫描同样也受到隐式参数_db_writer_scan_depth_pct的限制。

b.若前台进程在遍历LRU链表,顺带将脏块(dirty block)移动到LRUW链表上时,前台进程可能意识到LRUW链表的长度已经达到了某种阀值。该阀值定义了dirty queue(脏队列)的最大长度,该阀值一般受隐式参数2*_db_block_write_batch或_db_large_dirty_queue的影响。当此时刻LRUW链表将不在接受任何脏块(dirty buffer)。DBWR后台进程将被通知完成批量清理脏块的任务。在此情形下,DBWR处于一种”慌乱”状态中将不遗余力地去清理LRUW与LRU(仍需扫描,扫描深度如上述参数)链表中的脏块。与此同时,前台进程将被禁止访问LRU链表以避免产生进一步弄脏块和扫描操作。这种情形很像操作系统空闲内存小于所设minfree时,所发生的paging。此外在V7323中DBWR进程批量写出IO的大小受制于隐式参数_db_block_write_batch的值,因为该参数在之后的版本中被彻底废弃了,所以不再介绍。

8i以后:此情形仍将触发DBWR写出,但_db_block_write_batch隐式参数被彻底废弃,所以实际限制dirty queue(checkpoint queue)长度的是隐式参数_db_large_dirty_queue,其默认为25个buffer;该参数也可以通过x$kvit内部视图的”kcbldq large dirty queue if kcbclw reaches this”观察,该参数到11.2中依然健在。

c.DBWR后台进程每三秒空闲超时后被唤醒,每次超时均会唤醒DBWR去遍历buffer headers(扫描2*_db_block_write_batch个块)以寻找并写出任意脏块(排序块,临时块会被跳过)。这样做的目的是不让DBWR过于空闲。

8i以后:DBWR仍会通过调用semtimedop函数陷入3s一次的睡眠中,若在过去三秒中DBWR始终空闲,那么在它醒来后会写出少量的脏块到磁盘上。但不同于早期版本,目前版本中DBWR会每三秒根据增量检查点的要求写出脏块(脏块来源于CKPT-Q链表及由DBWR进程主动去扫描LRU链表的tail)。详见Jonathan Lewis的<How have the log_checkpoint_interval and log_checkpoint_timeout changed from version 7?>:

“(Oracle decided to keep trickling dirty blocks to disc at a higher rate than had been effected by the old 3-second idle write rate (every 3 seconds, dbwr wakes up and writes a few blocks to disc if it has had no other work in the interval).? To achieve this, they changed the meaning of the two log checkpoint parameters. This change was made possible by a change in the architecture of the buffer management, which now allows Oracle to queue dirty buffers in the order that they were first made dirty ??.Amongst other things, Oracle already kept a low redo block address (lrba)on each buffer header for each dirty buffer. This identifies the first redo block that started the process of changing that buffered block from the state that is currently on disc to the dirty state that is in the buffer. The function of the log_checkpoint_interval is simply to limit the distance between a buffer’s lrba and the addreess of the redo block that lgwr is currently writing. If the lrba is too low, then the block was first dirtied too long ago and it has to be written to disc (and its lrba set to zero). Since Oracle now queues dirty blocks in the order they were first dirtied (i.e. lrba order) it is a quick and cheap process to find such blocks.For example: if lgwr is currently writing redo block 12,100 and the log_checkpoint_interval is set to 4,000, then dbwr will be cued to write any dirty blocks with an lrba less than 8,100. This check is carried out every 3 seconds, and I believe the control files and possibly any affected data files are updated with the SCN at which this incremental checkpoint took place.)” http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html

d.8i以前不存在incremental checkpoint增量检查点,完全检查点发生时,LGWR将通知DBWR写出一系列的current,dirty和非临时buffer到磁盘上。注意因检查点写出的buffer块并不会被置为free状态,这些脏块仍保留在数据库高速缓存中,以减少不必要的物理IO。因为8i以后的检查点已发生巨大转变,故不再展开。

8i以后:完全检查点为增量检查点所”取代”,完全检查点仅在alter system checkpoint或shutdown(非abort)等少数情况下发生,增量检查点会引发checkpoint queue(dirty queue)上的脏块递进地被写出,每三秒CKPT后台进程将计算检查点目标RBA(Redo Block Address),该目标RBA基于:当前RBA,log_checkpoint_timeout,log_checkpoint_interval,fast_start_mttr_target,fast_start_io_target,最小在线日志的大小等因素。当增量检查点发生时所有在目标RBA相应时间之前被弄脏的buffer块都当被写出(When a checkpoint is initiated, DBWR writes all buffers on the queue until the checkpoint RBA is less than the head of the CKPTQ RBA)。在Oracle 10g中实现了自动调整检查点,只要不显示地(explicitly)设置参数FAST_START_MTTR_TARGET为零,自动检查点调整都将被启用,数据库将以较低的I/O负载写出脏块以提高性能。需要注意的是当启用mttr advisor时有必要将log_checkpoint_timeout,log_checkpoint_interval,fast_start_io_target三个参数设置为零。– DBWR write For checkpoint。


e.9i以前的rac称作”ops oracle parallel server”,因为当时还没有出现cache fusion功能,所以节点间同步缓存需要通过将current块写入到磁盘上来完成,这种因为block ping request的需求而导致的DBWR写出,可以归类为write for ping request (The biggest performance robber in the OPS architecture was the DB block ping. A DB block ping would occur when an instance participating in an OPS database had a block in its cache that another participating instance required. In OPS, if another instance required the block in the cache of a second instance, the block would have to be written out to disk, the locks transferred, and then the block re-read into the requesting instance.)

8i以后:在Oracle 8i的ops中初步实现了cache fusion(Oracle 8i (OPS) introduced the initial phase of cache fusion. The data blocks were transferred from the SGA of one instance to the SGA of another instance without the need to write the blocks to disk.)到9i中cache fusion技术日渐成熟,ops也更名为”RAC real application cluster”以区别于老式的cluster。因为cache fusion这一革命性的特性出现,9i以后的oracle cluster中dbwr被从需要不断完成write for ping request的性能窘境中拯救出来了;因而block ping request这种触发写出的条件也不再成立。

Latches and Tuning:Buffer Cache

1.  The buffer pool is structured into pools; RECYCLE, KEEP, and DEFAULT.  Unless otherwise specified at create time or altered, an object will reside in the DEFAULT pool.

a. Suppose you have a table that is frequently updated, but is a relatively small table.  You would want to ensure the blocks for this table are stored in the KEEP pool of the buffer cache.  Execute the following command to create the table TEST1 in the KEEP pool.> create table TEST1 (n number) storage (buffer_pool KEEP);

b.  Suppose you have a large table that is randomly accessed.  You want to ensure the blocks for this table are stored in the RECYCLE pool.  Execute the following command to create the table TEST2 in the RECYCLE pool:

> create table TEST2 (n nmber) storage (buffer_pool RECYCLE);

c.  Suppose you have a table that is subjected to normal access. You want to ensure the blocks for this table are stored in the DEFAULT pool. Execute the following command to create the table TEST3 in the DEFAULT pool:

> create table TEST3 (n number) storage (buffer_pool DEFAULT);

d.  Query the USER_TABLES view to see the table names and the pools to which they are assigned:

> select table_name, buffer_pool from user_tables where table_name like ‘TEST%’;

You can use the ALTER TABLE command to change the buffer pool for a table.

2. Various statistics are incremented during buffer management.
a.  Execute the following query to view the relevants statistics:

> select * from v$sysstat where name in (‘dirty buffers inspected’,’free buffer inspected’);

– dirty buffers inspected: incremented when dirty buffers are read and moved to the LRUW list.
– free buffers inspected: incremented when free buffers are detected during the search for a free buffer on the LRU list.

3. The parameter DB_BLOCK_LRU_LATCHES defines the number of working sets for the instance.

a.   Determine the value of DB_BLOCK_LRU_LATCHES by executing the following query:> select name,value from v$system_parameter where name = ‘db_block_lru_latches’;

4.  There is one working set per DB_BLOCK_LRU_LATCHES.

a.  Query the X$KCBWDS view to determine the number of working sets for the instance.> select count(*) from x$kcbwds;

The returned values should be the same as the value for DB_BLOCK_LRU_LATCHES.

5.  Each working set also has its own checkpoint latch.

a.  Execute the following query to view the checkpoint latch and set id’s for the instance:> select addr,set_id,ckpt_latch,set_latch,dbwr_num from x$kcbwds;

b.  Using one of the ckpt_latch values, execute the following query:

> select * from v$latch_children where addr = <address from query>;

With this you can see that there is one child latch for each checkpoint latch.

6.  The maximum number of DBWR cannot be more than the number of working sets.
a.  Set the number of DBWR processes to a value greater than DB_BLOCK_LRU_LATCHES.  Do this by changing the DB_WRITER_PROCESSES value in the init.ora.

b.  Bounce your database instance.

c.  Execute the following query:

> select dbwr_num from x$kcbwds;

You will see that the number of DBWR processes is equal to the number DB_BLOCK_LRU_LATCHES.

7. The number of DBWR processes are set using the DB_WRITER_PROCESSES parameter.

a.  View your init.ora file and determine if the number of DB_WRITER_PROCESSES has been explicitly set.

8.  Two parameters effect the way DBWR scans for purposes of making free buffers.

a  Execute the following query:

> select ksppinm from x$ksppi where ksppinm like ‘_db%scan%’;

_DB_WRITER_SCAN_DEPTH – Sets the scan depth for DBWR
_DB_BLOCK_MAX_SCAN_CNT – Sets the foregrond scan depth

The maximum number of dirty buffers in the write list, which will not cause DBWR to write is
max(DB_BLOCK_BUFFERS/8, 2*write-batch-size

9. A target value for the buffer cache hit% is 90.

a.  Determine the buffer cache % by executing the following query:> select 1 – (phy.value / (log.value + con.value )) “Buffer Cache Hit%”
from v$sysstat phy, v$sysstat log, v$sysstat con
where phy.name = ‘physical reads’ and
log.name = ‘db block gets’ and
con.name = ‘consistent gets’;

Descriptions of the buffer cache statistics used are:

– db block gets: tracks the number of blocks obtained in CURRENT mode.
– physical reads: stores the number of physical blocks when the OS retrieves a database block from disk
– consistent gets: number of times a consistent read was requested for a block.

Other buffer cache statistics include:

– physical writes
– db block changes
– free buffer requested
– dirty buffers inspected
– free buffer inspected
– CR blocks created

10.  Numerous statistics are associated with the DBWR process.

a.  Execute the following query to view the statistics associated with the DBWR process:> select name from v$sysstat where name like ‘DBWR%’;

Of the statistics listed two of the most important are DBWR checkpoints and DBWR make free requests.

11.  The number of cache buffers chains latches is equal to prime(db_block_buffers/4).  The number of cache buffers
lru chain latches is equal to the value of DB_BLOCK_LRU_LATCHES.

a.  Execute the following query to determine the approximate number of cache buffers chains latches:> select (value/4) from v$system_parameter where name = ‘db_block_buffers’;

b.  The value from the above should be approximately equal to the row count returned from the following
query:

> select count(*) from v$latch_children where name = ‘cache buffers chains’;

c.  Execute the following query to determine the number of cache buffers lru chain latches:

> select count(*) from v$latch_children where name = ‘cache buffers lru chain’;

d.  Check the value of DB_BLOCK_LRU_LATCHES and it should equal the row count returned from the first query.

12.  Wait events are also important for diagnosing buffer cache problems.  It’s the time_waited for each
event that is used for diagnostics.
a.  Execute the following query to view the two primary events associated with tuning the buffer cache:

> select * from v$system_event
where name in (‘buffer busy waits’,’free buffer waits’)
order by time_waited;

You may not see free buffer waits in the result set if there has been no wait time for this event.  It is listed in V$EVENT_NAME.

b.  Additional wait information can be derived from the views V$SESSION_WAIT and V$WAITSTAT views, and the X$KCBFWAIT table.  Query the V$WAITSTAT view to list the different types of
blocks:

> select class from v$waitstat;

Some of the most likely reasons for each block type are:

Segment Header – not enough free lists and high number of inserts
Data Blocks – too many rows in each block
Freelist Blocks – not enough free lists
Undo Header – too many transactions per rollback segment
Undo Blocks – not enough rollback segment blocks

沪ICP备14014813号-2

沪公网安备 31010802001379号