latch: shared pool等待事件

The shared pool latch is used to protect critical operations when allocating and freeing memory in the shared pool. Typically a miss rate for this latch less than 95% results in poor performance. There is only one latch of this type in the shared pool, so when there is contention for this latch, significant degradation in performance can occur.

Problem

ORA-4031 errors are seen in the alert log if a trace file is generated (such as when a background process receives this error instead of when a user proc may get it. This means the session cannot allocate contiguous memory in shared pool for incoming SQL statement.

Solutions

Since the number of shared pool latch gets is influenced by the volume of shared pool activity like parse operations, anything that can reduce the this activity will improve the availability of this latch and overall database performance (see Reducing the Number of SQL Statements).

latch: row cache objects等待事件

This latch comes into play when user processes are attempting to access or update the cached data dictionary values.

Solutions

Problem:

To determine if the row cache is being used efficiently, execute the following SQL. If the ratio is not close to 1 then some tuning required.

SELECT parameter, sum(gets), sum(getmisses),
100*sum(gets – getmisses) / sum(gets) pct_succ_gets,
sum(modifications) updates
FROM V$ROWCACHE
WHERE gets > 0
GROUP BY parameter;

Options for tuning row cache are so limited that some resources call it non-tunable. However if the value of the column pct_succ_gets in the above query is not close to “1” or if Ignite for Oracle shows significant waits for the “Row Cache Objects” latch, then consider increasing SHARED_POOL_SIZE.

Another way to adjust the dictionary cache is to increase the library cache. Since the dictionary cache is a part of the library cache, the increase will indirectly increase the dictionary cache. Also, consider the following:

Using Locally Managed tablespaces for application objects, especially indexes, may decrease Row Cache locks.

Review and amend database logical design. For example, try to decrease the number of indexes on tables with frequent inserts.

latch: library cache pin等待事件

Library cache pins are used to manage library cache concurrence. Pinning an object causes the heaps to be loaded into memory (if not already loaded). PINS can be acquired in NULL, SHARE or EXCLUSIVE modes and can be considered like a special form of lock. A wait for a “library cache pin” implies some other session holds that PIN in an incompatible mode.

Solutions

Solution: If this is the case, the shared pool may need tuning. See Metalink Note:62143.1.

Problem: If the wait only appears on one or two sessions, the locking scenario is session specific. Collect the following evidence and contact Oracle support.

At a time when a wait is occurring:

select P1RAW from V$SESSION_WAIT where EVENT = ‘library cache pin’

Use the P1RAW value for the following query to determine the object being waited on:

SELECT kglnaown “Owner”, kglnaobj “Object”
FROM x$kglob
WHERE kglhdadr=’&P1RAW’

The following SQL can be used to show the sessions that are holding and/or requesting pins on the object that given in P1RAW in the wait:

SELECT s.sid, kglpnmod “Mode”, kglpnreq “Req”
FROM x$kglpn p, v$session s
WHERE p.kglpnuse=s.saddr
AND kglpnhdl=’&P1RAW’

– An X request (3) will be blocked by any pins held S mode (2) on the object.

– An S request (2) will be blocked by any X mode (3) pin held, or may queue behind some other X request.

This information can be used to identify the blocker and kill the session for a quick fix.

latch:library cache lock等待事件

This event controls the concurrency between clients of the library cache. It acquires a lock on the object handle so that either:

One client can prevent other clients from accessing the same object

The client can maintain a dependency for a long time (for example, no other client can change the object)

This lock is also obtained to locate an object in the library cache.

Oracle needs Library Cache Lock and Library Cache Pin for the compilation and parsing of packages, procedures, functions, and other objects. This is to ensure that no one is using the object during compilation/parsing since changes are being made to the definition of the object.

Problem

This wait can occur when DDL is being executed on an object or one of its dependant objects. This causes a database object invalidation. Typically these are the object maintenance operations such as: ALTER, GRANT, and REVOKE. After object invalidation, Oracle tries to recompile the object at the time of the next access attempt. This may be a problem in a case where other sessions have pinned the object in the library cache because they are actively accessing the object. This is more likely to occur with more active users and with more complex dependencies (e.g. many cross-dependent packages or package bodies). In some cases, waiting for object recompilation may take hours, blocking all the sessions trying to access it.

Typical Scenarios

A DML operation that is hanging because the table which is accessed is currently undergoing changes (ALTER TABLE). This may take quite a long time depending on the size of the table and the type of the modification (e.g. ALTER TABLE MODIFY col1 CHAR(200) on millions of records).

The compilation of package will hang on Library Cache Lock and Library Cache Pin if some users are executing any procedure or function defined in the same package.

Sessions trying to access objects that have recently been modified in some way may have to wait while objects are recompiled.

Solution

If a session is hanging on this wait, it is possible to find its blocker and kill it. Refer to MetaLink Note: 122793.1 for details on how to find the blocker.

For a long-term fix, look at any processes or applications that might cause object invalidations and limit or time their execution for minimal impact. Be very careful with altering, granting or revoking privileges on database objects that frequently used PL/SQL is dependent on.

library cache latch等待事件

This latch serializes access to the objects in the library cache. Every time a SQL statement, a PL/SQL block or a stored object (procedure, package, function, trigger) is parsed or executed, this latch is acquired.

Problem – Fragmentation of the shared pool

Excessive execution of a statement can cause library cache contention.

Many versions of one SQL statement in the shared pool

Excessive parsing of SQL statements (even soft parsing) because of non-reusable queries or large objects being loaded in the shared pool forcing out the smaller heavily used queries

Solutions

Use bind variables whenever possible. Reuse of code means less parsing and less use of shared pool space.

Pin packages and procedures that are heavily executed in the shared pool. This prevents the heavily used code from being flushed out and thus needing to be parsed in.

Increase SESSION_CACHED_CURSORS parameter. This helps if the user repeatedly parses the same statements.

Use fully qualified table names. Example: “select * from owner.table” instead of “select * from table”

If using EXECUTE IMMEDIATE with bind variables, try using DBMS_SQL instead.

If the statement is complex and heavily executed, the load on the library cache could be reduced by breaking it up into multiple smaller statements.

Reducing the versions of SQL statements reduces the load on library cache as well. Oracle deals with various bind variable sizes by creating multiple versions of the statement. One possible solution would be writing more smaller queries rather than one large one.

Specify objects over a certain size be loaded into the large pool instead of the shared pool.

It’s possible that the shared pool is too small. If the above remedies have been attempted and contention is still evident, look into increasing the shared pool gradually.

Please also refer to Oracle’s MetaLink’s Note:146599.1 for information on this latch.

Oracle Latch:一段描绘Latch运作的伪代码

以下这段伪代码来自于OraPubCraig A. Shallahamer,这段代码并不长但基本对获取latch、spin、sleep的行为都描述清楚了,如果你对latch仍不甚了了,那么这段代码会对你很有帮助:

 

 

 

Function Get_Latch(latch_name,mode)
{
  If Mode eq ‘immediate’ {
    If Fast_Get(latch_name) {
      return TRUE
    Else {
      return FALSE
    }
  }
  Else {
    If Fast_Get(latch_name)
    Then {
      v$latch.gets++
      return TRUE
    }
    Else {
      v$latch.misses++
      for try = 0 .. large_number
      {
        if Spin_Get(latch_name)
        Then {
          return TRUE
        }
        Else {
          T0 = time
          Sleep(try)
          T1=time
          Register_Event("latch free",T1-T0)
        }
      } -- spin/sleep loop
    }
  }
}

Function Fast_Get(latch_name)
{
  If try_to_get_latch(latch_name)
  Then {
    return TRUE
  }
  Else {
    return FALSE
  }
}

Function Spin_Get(latch_name)
{
  v$latch.spin_get++
  for i = 1 to _spin_count
  {
    If Fast_Get(latch_name)
    Then {
      return TRUE
    }
  }
}

Function Sleep(try)
{
  v$latch.sleeps++
  v$latch.sleep[try]++
  sleeptime =
    decode(try,0,0,1,10,2,20,3,~40,4,~80,...~2000)
  sleep(sleeptime)
}

Waiting For KKSFBC CHILD COMPLETION?

客户有一套AIX 5.3上的10.2.0.4.5生产库系统,最近频繁出现”KKSFBC CHILD COMPLETION”等待,同时导致session不断spin消耗CPU并hang住,从表象看这似乎是由bug引起的。以KKSFBC CHILD COMPLETION为关键字到MOS查询可以找到<Bug 6795880 – Session spins / OERI after ‘kksfbc child completion’ wait – superceded [ID 6795880.8]>,该Bug的症状为进程不断spin且hang住、出现’KKSFBC CHILD COMPLETION’等待事件、还可能伴有’Waits for “cursor: pin S”‘等待事件,直接影响的版本有11.1.0.6、10.2.0.3和10.2.0.4。

对于该Bug的描述是在发生’kksfbc child completion’等待事件后会话陷入无休止的自旋(spins)中,这种自旋(spins)发生在由堆栈调用(stack call)kksSearchChildList->kkshgnc陷入对kksSearchChildList函数的无限循环中。

就当前用户提供的版本号及等待事件信息仍不足以定位到该Bug,我们需要更详细的stack call。所幸的是这个trouble是可以重现的(reproduceable),在之后的一次案发现场我们得到了必要的信息:

Name            PID  CPU%  PgSp Owner
oracle      3723390  10.0   7.0 oracle

SQL> oradebug setospid 3723390
Oracle pid: 155, Unix process pid: 3723390, image: oracle@lmsdb1

SQL> oradebug unlimit;
Statement processed.

SQL> oradebug short_stack;
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044F0
<-kksfbc+0b9c<-kkspsc0+0ffc<-kksParseCursor+00d4
<-opiosq0+0ae0<-kpooprx+0168<-kpoal8+0400<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124
<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0098

SQL> oradebug dump processstate 10;
Statement processed.

SQL> oradebug dump systemstate 266;
Statement processed.

=========================process state dump=========================

    SO: 7000003f72e3378, type: 4, owner: 7000003f225c798, flag: INIT/-/-/0x00
    (session) sid: 270 trans: 0, creator: 7000003f225c798, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-009B-0000017F, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 0, prv: 0, sql: 700000243ef9540, psql: 7000002fe76d9c0, user: 28/OLSUSER
    service name: SYS$USERS
    O/S info: user: newprepay, term: unknown, ospid: 1234, machine: newprepayC
              program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
    last wait for 'kksfbc child completion' blocking sess=0x0 seq=4397 wait_time=48840 seconds since wait started=144918
                =0, =0, =0
    Dumping Session Wait History
     for 'kksfbc child completion' count=1 wait_time=48840
                =0, =0, =0

可以从以上trace中看到会话确实曾长时间处于’kksfbc child completion’等待中,之后陷入无限自旋(spins)中消耗了大量CPU时间。但这里实际的表现又存有差异,引发无限循环的函数是kksfbc而不是kksSearchChildList(常规的调用序列是:kksParseCursor->kkspsc0->kksfbc ->kksSearchChildList->kkshgnc)。

kksfbc意为K[Kernel]K[Kompile]S[Shared]F[Find]B[Best]C[Child]该函数用以在软解析时找寻合适的子游标,在10.2.0.2以后引入了mutex互斥体来取代原有的Cursor Pin机制,Mutex较Latch更为轻量级。著名的Tanel Poder指出虽然mutex的引入改变了众多cursor pin的内部机制,但kksfbc仍需要持有library cache latches才能扫描library cache hash chains(However the traversing of library cache hash chains (the right child cursor lookup using kksfbc()) was still protected by library cache latches)。另一方面当kksfbc函数针对某个parent cursor找到合适child cursor后,可能使用KKSCHLPINx方法将该child cursor pin住,这个时候就需要exclusive地持有该child cursor相应的mutex,如:

SQL>  select mutex_addr,requesting_session rs,blocking_session bs,location from x$mutex_sleep_history;

MUTEX_ADDR               RS         BS LOCATION
---------------- ---------- ---------- ----------------------------------------
00000000A3CF8928        159        148 kksfbc [KKSCHLPIN1]
00000000A3CEA8A8        159          0 kksfbc [KKSCHLPIN1]
00000000A3CF5508        159          0 kksfbc [KKSCHLPIN1]
00000000A3CF10E8        148        159 kksLockDelete [KKSCHLPIN6]

当有一个进程执行kksfbc,而其他进程可能需要陷入’kksfbc child completion’等待中(更多的是cursor:pin S等待事件),但这种等待一般是十分轻微的,你很难从某个”仪表”上观察到这一事件。因为一系列kksfbc相关的bug仅发生在10.2.0.2以后,可以猜测是由于mutex的引入引起的。

打破’kksfbc child completion’异常等待的一种行之有效的workaround方法是设置隐藏参数_use_kks_mutex_pin为false,即使用传统的由latch保护的Cursor pin;设置该hidden parameter需要重启实例,具体的设置方法如下:

alter system set "_kks_use_mutex_pin"=false scope=spfile;

restart instance...........

但这种方法也仅仅是行之有效,而非万试万灵。从消极的角度来说,因为放弃了mutex保护cursor pin的机制,在解析频繁的系统中library cache latch的争用将白热化,有可能成为Top 5等待事件。

另一种值得推荐的workaround是减少硬编码的SQL语句,使用session_cached_cursor减少实际的软解析操作也有助于避免引发该Bug;当然这要求我们修改代码,但从长远来看这种改良是值得的。

最后Oracle在10.2.0.4上提供了该Bug的one-off Patch 8575528,其在10.2.0.4 psu4以后的等价补丁为(Equivalent patch)为merge patch 9696904:

8557428 9696904 7527908 Both fixes are needed. 6795880 superceded by 8575528 in 9696904 which includes extra files so may cause new conflicts

但merge patch 9696904目前仅有Linux x86/64平台上的版本,而问题数据库所在平台为IBM AIX on POWER Systems (64-bit)。如果要通过补丁来fix这个问题的话,AIX平台的用户可以要求Oracle development开发部门build一个Power版的9696904 patch,也可以升级到10.2.0.5上去

注意Patch 8575528: MISSING ENTRIES IN V$MUTEX_SLEEP.LOCATION目前有IBM AIX on POWER Systems (64-bit)平台上10.2.0.4.4的版本,经过验证该one-off patch可以在10.2.0.4 PSU4以后的版本上实施(包括10.2.0.4.5/10.2.0.4.6等),不需要如以上描述地去apply 9696904这个merge patch。当然升级到10.2.0.5/11.1.0.7或以上版本依然有效。

此外apply以上8575528:后需要修改隐藏参数”_cursor_features_enabled”并重启才能使fix生效,具体设置防范如下:

ALTER SYSTEM SET "_cursor_features_enabled"=10 scope=spfile;
restart instance.............

注意以上参数仅在使用one-off patch时需要设置,而当通过升级到10.2.0.5/11.1.0.7来修复问题时无需设置上述”_cursor_features_enabled”参数。

Oracle闩:Cache Buffers chains

Latch cache buffers chains大约是Oracle中child latch数量最多,使用最为频繁的闩锁了。其子闩总数受到初始化参数(8i中的db_block_buffers/4)的影响,Oracle中有大量kernel函数有机会接手持有cache buffer chains latch:

col parent_name for a25
col location for a40

SELECT t1.ksllasnam "parent_name",
       t2.ksllwnam  "location"
  FROM x$ksllw t2, x$kslwsc t1
 WHERE t2.indx = t1.indx
   AND ksllasnam = 'cache buffers chains';

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbw_activate_granule
cache buffers chains      kcbw_first_buffer_free
cache buffers chains      kcbwxb
cache buffers chains      kcbw_examine_granule
cache buffers chains      kcbw_next_free
cache buffers chains      kcbw_first_buffer_free_2
cache buffers chains      kcbbckb
cache buffers chains      kcbbioe
cache buffers chains      kcbbic1
cache buffers chains      kcbbcwd
cache buffers chains      kcbbxsv

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbbwdb
cache buffers chains      kcbbic2
cache buffers chains      kcbkzs
cache buffers chains      kcbrmf2so
cache buffers chains      kcbget: exchange rls
cache buffers chains      kcbralloc
cache buffers chains      kcbgcur: deadlock
cache buffers chains      kcbgcur: buf pinned
cache buffers chains      kcbgtcr
cache buffers chains      kcbchg: no fast path
cache buffers chains      kcbchg: apply change

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbrra: buf exists
cache buffers chains      kcbrra: update buf flags
cache buffers chains      kcbema: find buf
cache buffers chains      kcbtema: find buf
cache buffers chains      kcbget: prewarm wait
cache buffers chains      kcbrfrebuf
cache buffers chains      kcbsod1
cache buffers chains      kcbrbrl
cache buffers chains      kcbgcur: kslbegin
cache buffers chains      kcbgtcr: kslbegin shared
cache buffers chains      kcbrls: kslbegin

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbchg: kslbegin: bufs not pinned
cache buffers chains      kcbchg: kslbegin: call CR func
cache buffers chains      kcbnlc
cache buffers chains      kcbget: exchange
cache buffers chains      kcbget: pin buffer
cache buffers chains      kcbnew: new latch again
cache buffers chains      kcbgkcbcr
cache buffers chains      kcbget: in cur_read
cache buffers chains      kcbesc: escalate
cache buffers chains      kcblbi
cache buffers chains      kcbcge

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbfrl
cache buffers chains      kcbzsc
cache buffers chains      kcbibr
cache buffers chains      kcbnew_1
cache buffers chains      kcbema
cache buffers chains      kcbsrbd
cache buffers chains      kcbso1: set no access
cache buffers chains      kcbtema
cache buffers chains      kcbso1: in done_clr
cache buffers chains      kcbsod2
cache buffers chains      kcbzcg

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzre1
cache buffers chains      kcbrlb1
cache buffers chains      kcbchkrsod
cache buffers chains      kcbxbh
cache buffers chains      kcbzsck
cache buffers chains      kcbgtcr: fast path
cache buffers chains      kcbgtcr: kslbegin excl
cache buffers chains      kcbgtcrf
cache buffers chains      kcbfdgd
cache buffers chains      kcbdng
cache buffers chains      kcbbufaddr2hdr

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbo_ivd_process
cache buffers chains      kcbo_write_process
cache buffers chains      kcbo_exam_buf
cache buffers chains      kcb_pre_apply: kcbhq61
cache buffers chains      kcb_post_apply: kcbhq62
cache buffers chains      kcb_post_apply: kcbhq63
cache buffers chains      kcbnew : new esc failed
cache buffers chains      kcbesc : escalate failed
cache buffers chains      kcb_private_owner
cache buffers chains      kcb_is_private
cache buffers chains      kcb_unprivatize

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcb_restore_block_headers
cache buffers chains      kcb_flush_undo_buffers
cache buffers chains      kcbgcur - DEADL
cache buffers chains      kcbtbd
cache buffers chains      kcbzwc
cache buffers chains      kcbzwx
cache buffers chains      kcbrmflx
cache buffers chains      kcbzwb
cache buffers chains      kcbzgb: get latch after post
cache buffers chains      kcbzgb: scan from tail. nowait
cache buffers chains      kcbzgb: exit_loop

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzib: multi-block read: nowait
cache buffers chains      kcbzib: finish free bufs
cache buffers chains      kcbzcb
cache buffers chains      kcbzdh
cache buffers chains      kcbdpr
cache buffers chains      kcbcxx
cache buffers chains      kcbzrn
cache buffers chains      kcbdpd: for specific dba
cache buffers chains      kcbdpd: dump all buffers
cache buffers chains      kcbzib: exchange rls
cache buffers chains      kcbzpnd: dump buffers

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzhngcbk1: get hash chain latch no wai
                          t

cache buffers chains      kcbo_cxx
cache buffers chains      kcbz_check_obj_reuse_sanity
cache buffers chains      kcbzib_grlk
cache buffers chains      kcbz_force_maps
cache buffers chains      kcbrldflx: recover in-flux bufs
cache buffers chains      kcbra1fbuf: recover in-flux bufs
cache buffers chains      kcbrafb: flashback bufs:1
cache buffers chains      kcbrafb: flashback bufs:2

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbr_media_apply: find buffer
cache buffers chains      kcbr_issue_read: alloc buffer
cache buffers chains      kcbr_issue_read: retry alloc
cache buffers chains      kcbr_validate_read: mark corrupt
cache buffers chains      kcbr_apply_change: after apply
cache buffers chains      kcbr_mapply_change
cache buffers chains      kcbr_mrcv_clear_fgda
cache buffers chains      kclwlr
cache buffers chains      kclebs_1
cache buffers chains      kclcls
cache buffers chains      kclcsr_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclpred
cache buffers chains      kclcls_1
cache buffers chains      kclple_1
cache buffers chains      kclple_2
cache buffers chains      kclcls_2
cache buffers chains      kcllwr
cache buffers chains      kclwcrs
cache buffers chains      kclcrs_1
cache buffers chains      kclcsr
cache buffers chains      kclrls
cache buffers chains      kclwcrs_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclfbst_1
cache buffers chains      kclpdc_1
cache buffers chains      kclwcrs_2
cache buffers chains      kclwcrs_3
cache buffers chains      kclfpdb
cache buffers chains      kclfpdb_2
cache buffers chains      kclpdc_2
cache buffers chains      kcllkopb
cache buffers chains      kclgrantlk
cache buffers chains      kclwrt
cache buffers chains      kcllkopb_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclwcrs_4
cache buffers chains      kcllkopb_2
cache buffers chains      kclcls_4
cache buffers chains      kclpred_1
cache buffers chains      kclrclr_2
cache buffers chains      kclrecbst
cache buffers chains      kclgrantlk_1
cache buffers chains      kclcls_5
cache buffers chains      kclrwrite_1
cache buffers chains      kclrwrite_2
cache buffers chains      kclcopy

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclswrite
cache buffers chains      kclchash
cache buffers chains      kclcfusion
cache buffers chains      kclfchk_1
cache buffers chains      kclcfusion_1
cache buffers chains      kclblkdone
cache buffers chains      kclcfusion_2
cache buffers chains      kclrenounce
cache buffers chains      kclbla
cache buffers chains      kclpto_1
cache buffers chains      kclgrantlk_2

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclcomplete
cache buffers chains      kclshrshr
cache buffers chains      kclclaim
cache buffers chains      kclhngcbk1
cache buffers chains      kclblkdone_1
cache buffers chains      kclgvlk
cache buffers chains      kclblkdone_2
cache buffers chains      kclcclaim
cache buffers chains      kclrechk_1
cache buffers chains      kclrechk_2
cache buffers chains      kclbr

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclpto
cache buffers chains      kclpdcl
cache buffers chains      kclpdc_3
cache buffers chains      kclpdc_4
cache buffers chains      kclgcr_1
cache buffers chains      kclcls_6
cache buffers chains      kclevict
cache buffers chains      kcldle
cache buffers chains      kclrcopy
cache buffers chains      kclenter
cache buffers chains      kclrbast

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclexpand
cache buffers chains      kclcls_3
cache buffers chains      kclverify
cache buffers chains      kclaffinity
cache buffers chains      kclassert
cache buffers chains      kclobj
cache buffers chains      kclobj_1
cache buffers chains      kclobj_2
cache buffers chains      kclgclk
cache buffers chains      kclwcrs_5
cache buffers chains      kclscrs

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclstalepi
cache buffers chains      kclstalepi_1
cache buffers chains      kclstalepi_2
cache buffers chains      kclgvlk_1
cache buffers chains      kclgclk_1
cache buffers chains      kclgclk_2
cache buffers chains      kclcsr_2
cache buffers chains      kcbvm

204 rows selected.

我们一般认为Latch结构是Mostly exclusive access的,也就是极少会有共享访问闩的机会。但Oracle一般对外宣称读取数据时服务进程是以共享模式使用cache buffers chains闩,这就造成了许多人误以为读读是不会出现latch: cache buffers chains争用的。
但是实际上查询语句大部分情况下仍需要以exclusive模式持有该类子闩(有时会以SHARED模式持有,这取决于读取时是使用kcbgtcr: kslbegin shared还是kcbgtcr: kslbegin excl;kcbgtcr是Oracle rdbms中重要的获取一致性读的函数,其含义为Kernal Cache Buffer GeT Cosistents Read,显然该函数存在两种获取cache buffers chains的方式即kslbegin shared和excl。与之相对应的是kcbgcur: kslbegin,kcbgcur的含义为Kernel Cache Buffer Get Current,该函数用以获取当前块以便修改,也就是”写”;很显然kcbgcur: kslbegin函数只有以excl排他方式持有child cache buffers chains latch的必要),原因在于虽然是查询语句但同样需要修改buffer header结构,譬如修改tch抢手度、holder list的hash变量us_nxt、us_prv以及waiter list的hash变量wa_prv、wa_nxt等。换而言之读读是会引起Latch free:cache buffers chains等待的,而非许多人认为的仅有读写、写写会导致缓存链闩争用。

这个问题我们再用实验证明一遍:

SQL> drop table maclean;
drop table maclean
           *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> create table maclean tablespace users as select * from dba_objects;
Table created.

SQL> select count(*) from maclean;

  COUNT(*)
----------
     51944

SQL> /

  COUNT(*)
----------
     51944

SQL> /

  COUNT(*)
----------
     51944

SQL> select spid from v$process where addr =(select paddr from v$session where sid=(select distinct sid from v$mystat));

SPID
------------
6023

另开一个命令行窗口:

[maclean@rh2 ~]$ gdb $ORACLE_HOME/bin/oracle 6023
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /s01/10gdb/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/10gdb/bin/oracle, process 6023
Reading symbols from /s01/10gdb/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libskgxp10.so
Reading symbols from /s01/10gdb/lib/libhasgen10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libhasgen10.so
Reading symbols from /s01/10gdb/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libskgxn2.so
Reading symbols from /s01/10gdb/lib/libocr10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocr10.so
Reading symbols from /s01/10gdb/lib/libocrb10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocrb10.so
Reading symbols from /s01/10gdb/lib/libocrutl10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocrutl10.so
Reading symbols from /s01/10gdb/lib/libjox10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libjox10.so
Reading symbols from /s01/10gdb/lib/libclsra10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libclsra10.so
Reading symbols from /s01/10gdb/lib/libdbcfg10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libdbcfg10.so
Reading symbols from /s01/10gdb/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00000039f280d8e0 in __read_nocancel () from /lib64/libpthread.so.0
(gdb) break kcbgtcr
Breakpoint 1 at 0x108c72c

回到原sqlplus窗口再次执行查询语句,会hang住:
SQL> select count(*) from maclean;

在gdb窗口:
(gdb) break kslfre
Breakpoint 2 at 0x7a133c

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000007a133c in kslfre ()
(gdb) c
Continuing.

多次continue直到出现kslfre内核函数,开一个新的sqlplus窗口:
SQL> set autotrace on;
SQL> select count(*) from v$latchholder;

  COUNT(*)
----------
         3

Execution Plan
----------------------------------------------------------
Plan hash value: 1575818826

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |            |     1 |    13 |            |          |
|*  2 |   FIXED TABLE FULL| X$KSUPRLAT |     1 |    13 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------

v$latchholder的数据来源于X$KSUPRLAT内部视图,因为v$latchholder不带mode字段,所以我们直接观察X$KSUPRLAT

SQL> select KSUPRLAT "address",KSUPRLNM "name",KSUPRLMD "mode" from X$KSUPRLAT;

address          name                      mode
---------------- ------------------------- ---------------
00000000FCE40040 cache buffers chains      EXCLUSIVE
00000000FA696978 simulator lru latch       EXCLUSIVE
00000000FA6CDCE0 simulator hash latch      EXCLUSIVE

/* 可以看到即便是查询语句也是以EXCLUSIVE mode持有child cache buffers chains latch */

(gdb) quit
A debugging session is active.

        Inferior 1 [process 6235] will be detached.

Quit anyway? (y or n) y
Detaching from program: /s01/10gdb/bin/oracle, process 6235

/* 可以通过以下statistics可以看到以上读取为纯粹的逻辑读,没有物理读取的部分干扰*/

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        719  consistent gets
          0  physical reads
          0  redo size
        517  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Latches and Tuning:Latches

1. Query the V$FIXED_TABLE to determine the V$ views that provide latch information.

a. Execute the following query:> select name from v$fixed_table where name like ‘V$LATCH%’;

2. Remember that there are 14 levels of latches, numbered 0 – 13.

a. Take a look at three common latches and their level by executing the following query:> select name, level# from v$latch where name in
(‘cache buffers chain’,’library cache’,’redo allocation’);

You may or may not see each of these latches listed depending upon the state and activity on your database.

3. Each parent and child latch has one instance in X$KSLLT.

a. Execute the following query to view a KSLLT structure for a latch:> select v$latch_parent.addr,v$latch_parent.level#,v$latch_parent.latch#,
v$latch_parent.name
from v$latch_parent,x$ksllt
where v$latch_parent.addr = x$ksllt.addr;

You should at least see information for the ‘latch wait list’ latch. However, your results will vary depending upon the state and activity of your database.

4. Using the latch# from the previous query, you can query the X$KSLLD table to see the array of latch descriptors.  One of these arrays is stored in every PGA.

a. Execute the following query to view the array for the associated latch#:> select * from X$KSLLD where indx = <latch# from previous query>;

A given process can select from this table to view the latch array.

Note:  As you can see, output from the X$ tables is somewhat cryptic.  However, knowledge of where to gather information regarding latches will help you to more effectively diagnose latch contention and communicate with Oracle support.  The follow on lessons to this module will use the dynamic performance views and system tables for diagnosing and resolving latch contention.

5. Query the V$SYSTEM_PARAMETER to view the SPIN_COUNT and MAX_EXPONENTIAL_SLEEP parameters:

a. Execute the following query:> select name from v$system_parameter where name like ‘%spin%’ or name like ‘%max%’;

Note: If you are running Oracle8i you will get no rows returned.  That’s because each of these parameters are now hidden.

b.  Execute the following query using the X$KSPPI table which lists hidden parameters:

>  select ksppinm from x$ksppi where (ksppinm like ‘%spin%’ or ksppinm like ‘%max%’) and ksppinm like ‘\_%’ escape ‘\’ ;

Also note the _MAX_SLEEPING_HOLDING_LATCH and _LATCH_WAIT_POSTING parameters.  You can query the ksppidesc column to get a description of these parameters.

6. Remember that some latches are held for long periods of time, while others are not.

a. Execute the following query to view three latches that are held for long periods of time:> select name, level# from v$latch where
name in (‘library cache’,’shared pool’,’library cache load lock’,’latch wait list’);

Also notice the level numbers of these latches taking into account that most latches are in the range of  0 – 7.  The latch wait list latch protects the linked list of waiters.

7. Certain wait events are associated with latches.  A session process may wait on a latch and the wait time will be updated in the V$SESSION_EVENT and
V$SYSTEM_EVENT dynamice performance views.  V$SESSION_WAIT also provides event informaton.

a. Execute the following query to view real time information on which sessions are currently waiting or have just waited for an event:> select sid, event, wait_time
from v$session_wait
order by wait_time, event;

Latch Free will appear in the output if there were statistics for this event.

b.  Query V$SYSTEM_EVENT to view aggregate statistics for the latch free event:

> select * from v$system_event where event = ‘latch free’;

c.  Query V$SESSION_EVENT to view statistics for the latch free event for each session:

> select * from v$session_event where event = ‘latch free’;

8. You can obtain information about specific latches by using both the V$SESSION_WAIT and V$LATCH views..

a. Execute the following query to view three latches that are held for long periods of time:> select distinct l.name “Latch”
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.event = ‘latch free’;

You may or may not get a result set from this query if there are no session latch free events.  If so then execute the next query to get an idea of wait events:

> select DISTINCT l.name “Latch”, p1, p2
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.wait_time = 0;

Note: Remember that P1 is the latch address and P2 is the latch number.  The DISTINCT operator is used because a process may hold more than one of the same latch.  Run the same query without the DISTINCT operator and notice the result set will have multiple P1 values for the same latch.

> select l.name “Latch”, p1, p2
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.wait_time = 0;

9.  Latch contention occurs when a latch is requested by several processes at the same time. The impact of latch contention can be assessed using the sleep histogram in V$LATCH.

a.  Execute the following query to derive information about latch sleeps:> select name, sleeps * sleeps / (misses – spin_gets ) impact
from v$latch
where sleeps > 0;

b.  The SLEEP columns in V$LATCH represent the sleep buckets in the histogram.  Describe V$LATCH to view the sleep buckets:

> desc v$latch

Note: SPIN_GETS is also a sleep bucket.  Only the first four buckets are used.
GETS = immediate_gets + (MISSES(SPIN_GETS+sleep_gets(SLEEP1 + SLEEP2 + SLEEP3))

10.  You can also derive information about latches by creating a trace file.

a.  To create a trace file execute the following command:> alter session set events ‘immediate trace name latches level 10’

b.  The trace file will be created in the directory that is specified by the parameter USER_DUMP_DEST.      View this file and you will see information about sleeps and nowait times.

c.  If child latch#’s are listed in the trace file, you can query V$LATCH_CHILDREN to acquire additional information.

Latches and Tuning:Redo Log Buffer and Latches

1. The size of the redo log buffer is determined by the LOG_BUFFER parameter.

a.  Determine the size of the LOG_BUFFER by executing the following query:> select * from v$sga;

b.  View the LOG_BUFFER setting in the init.ora file.

2.  There are three latches associated with the redo log buffer.

a.  Execute the following query to view the latches:> select name from v$latchname where name like ‘redo%’;

There will be more discussion about these latches later in the module.

3.  A number of parameters that effect latch performance have been made obsolete in Oracle8i.
a.  Execute the following query to view the obsolete redo buffer parameters:

> select ksppinm from x$ksppi where ksppinm like ‘_log%’;

Of the parameters listed, _LOG_SIMULTANEOUS_COPIES  and _LOG_IO_SIZE are now obsolete.  LOG_SMALL_ENTRY_MAX_SIZE and LOG_ENTRY_PREBUILD_THRESHOLD are no longer available in Oracle8i.

_LOG_SIMULTANEOUS_COPIES effects the redo copy latch, but since the redo copy is always acquired there is no need to set this parameter.
_LOG_IO_SIZE effects the redo writing latch.

b.  Execute the following query to view the redo related parameters that are still available in Oracle8i:

> select name from v$system_parameter where name like ‘log%’;

You should see the parameters log_buffer, log_checkpoint_interval, and log_checkpoint_timeout.

4. There are 10 wait events associated with the redo log buffer and latches.

a.  View the wait events by executing the following query:> select name from v$event_name where name like ‘log%’;

Of the events that are listed, three are of primary importance:

Log buffer space: a process is waiting for space in the log buffer
Log file sync: the time it takes for LGWR to complete the write then post the requester.
Log file parallel write: the main event waited while writing the redo records to the current log file.

b. You can also gather event information from the V$SYSTEM_EVENT view by executing the following query:

> select event,total_waits,total_timeouts,time_waited from v$system_event where
event in (‘log buffer space’,’log file sync’,’log file parallel write’);

In a well tuned system redo-related waits should be none or minimal.  If one of the events is not listed in the output that means no event has occurred.

5.  There are 14 statistics associated with the redo log buffer and latches.

a.  Execute the following query to view the statistics:> select name from v$sysstat where name like ‘redo%’;

The most important statistics are:

Redo Writes: number of times the log buffer is written.
Redo Blocks Written: total number of blocks written.
Redo Log Space Requests: number of times requested to allocate space in the log file.
Redo Size, Redo Entries: how much redo is generated since instance statrup
Redo Buffer Allocation Retries: total number of retries necessary to allocate space in the redo buffer.

6.  Contention for redo log buffers does not generally cause major performance problems on the database. Waits for
the event log buffer space could indicate contention for the redo log buffers.  However, it may be more useful to find
out the proportion of redo entries which had to wait for space to become available.
a.  Execute the following query to determine the ratio:

> select r.value, s.value, r.value/s.value
from v$sysstat r, v$sysstat s
where r.name = ‘redo buffer allocation retries’ and
s.name = ‘redo entries’;;

The ratio should be near zero.

7. Remember from the first practice that there are three latches associated with the redo log buffer; redo allocation,
redo copy, and redo writing.  The tuning goal is to minimize contention and waits for these latches.

a.  To tune the latches, look at the latch statistics and the ratio of MISSES to GETS by executing the following query:> select n.name, gets, misses, immediate_gets, immediate_misses
from v$latch l, v$latchname n
where n.name in (‘redo allocation’,’redo copy’,’redo writing’)
and n.latch#= l.latch#;

If the ratio of MISSES to GETS exceeds 1%, or the ratio of IMMEDIATE_MISSES to (IMMEDIATE_MISSES+IMMEDIATE_GETS) exceeds 1%, there is latch contention.

To return the ratios you can execute the following query.  However, you may receive a division by zero error.

> select n.name, gets, misses, immediate_gets, immediate_misses, (misses/gets),
(immediate_misses/(immediate_misses+immediate_gets))
from v$latch l, v$latchname n
where n.name in (‘redo allocation’,’redo copy’,’redo writing’)
and n.latch#= l.latch#;

8.  You can increase the number of redo copy latches by setting the parameter LOG_SIMULTANEOUS_COPIES.
This parameter is a hidden parameter and must be set in the init.ora and the database must be restarted for it
to take effect.

a.  One method for determing the number of redo copy latches is to query the V$LATCH_CHILDREN view.  There will be one latch for the value of _LOG_SIMULTANEOUS_COPIES.  If there is only one latch, you will have to query V$LATCH_PARENT as there will be no children.> select name from v$latch_children where name = ‘redo copy’;

b.  If you want to increase or decrease the number of redo copy latches, modify the parameter _LOG_SIMULTANEOUS_COPIES in the init.ora file.  Once you have done this, you can bounce the
database and execute the query in step a.

9.  You can increase the number of redo writing latches by modifying the parameter _LOG_IO_SIZE.  This
parameter is a hidden parameter and must be set in the init.ora and the database must be restarted for it to take
effect.
a.  One method for determing the number of redo copy latches is to query the V$LATCH_CHILDREN view.  There will be one latch for the value of _LOG_IO_SIZE.  If there is only one latch, you will have to query V$LATCH_PARENT as there will be no children.

> select name from v$latch_children where name = ‘redo writing’;

b.  If you want to increase or decrease the number of redo copy latches, modify the parameter
_LOG_IO_SIZE in the init.ora file.  Once you have done this, you can bounce the database and execute the
query in step a.

沪ICP备14014813号-2

沪公网安备 31010802001379号