How does SGA/PGA allocate on AMM?

Oracle 11g中引入了革命性的Automatic Memory Management(AMM)特性,通过该特性DBA只需要为Instance指定一个参数(memory_target),数据库软件就会根据SGA/PGA内存的实际使用统计信息来调优SGA/PGA内存区域的大小。从技术上说这是一个很cool的特性,可以说是Oracle所提倡的self_tuned即自身调优数据库软件大成的一个标志。但从另一方面来看,AMM也会给我们带来不少问题和困惑,DBA需要面对更多黑盒内隐藏的秘密了。

虽然我们无法彻底了解AMM的所有细节,但有一些关键性问题肯定会在我们使用AMM的过程造成许多的不确定性,在此我列出部分问题及其解答。

Question1:在使用AMM特性时,即设置了memory_target的情况下sga/pga内存区域默认各占百分之多少?

Answer:
这个问题存在多种不同的情景:

1)设置memory_target的同时,设置了sga_target及pga_aggregate_target参数:这种情况下设置的sga_target与pga_aggregate_target之和不能大于memory_target;默认sga大小等同于sga_target设定的值,可以从V$memory_Dynamic_Components视图中查询到pga target的current size可能远大于pga_aggregate_target(这里查询出来的current size仅代表pga的一个目标值,不是pga当前实际占用的内存),一般来说这里的pga current size等于(memory_target-sga_target),显然pga target的最小值会是pga_aggregate_target,而sga的最小值为sga_target.在此前提下sga/pga内存区域有较小的灵活性,实际上仅当memory_target>sga_target+pga_aggregate_target的情况下,sga/pga才可能发生扩展(grow)和收缩(shrink)

2)设置memory_target的同时,设置了sga_target而未设置pga_aggregate_target参数:这种情况下Oracle会自动调优以上2个参数,pga_aggregate_target的初始值为(memory_target-sga_target),注意这里说的是初始值;这种情况下因为没有硬性地设置pga_aggregate_target,故而sga/pga内存区域有较大的灵活性去扩展(grow)和收缩(shrink)。更倾向于sga的扩展,当然这并不绝对。

3)设置memory_target的同时,设置了pga_aggregate_target而未设置sga_target:这种情况下Oracle会自动调优以上2个参数,sga_target的初始值为min(memory_target-pga_aggregate_target,sga_max_size),注意这里说的是初始值;类似于以上的情况因为没有硬性地设置sga_target,故而sga/pga内存区域有较大的灵活性去扩展(grow)和收缩(shrink)。更倾向于pga的扩展,当然这并不绝对。

4)仅仅设置了memory_target参数,Oracle会自动调优sga/pga内存区域,且没有任何最小值或默认值的约束。当在初始化分配操作系统内存的时候存在一个分配pga/sga的策略;该策略会在实例启动阶段(instance startup)为SGA分配memory_target 60%的内存(该比例受到隐藏参数_memory_initial_sga_split_perc Initial default sga target percentage with memory target的影响,该参数默认为60),而为PGA分配40%的内存。这种情况下sga/pga内存区域的游标最为灵活,可以不受限制地上下浮动,是真正意义上的AMM,但也最为危险!

5)没有设置memory_target参数,该情况下Oracle 11g的表现与10g中”一致”(并非完全一致,黑盒中藏了很多秘密,只是看上去这样,后面会提到)

Question2:”我在10g就感到ASMM很不好用,产生了很多问题;我是个保守派,对11g中的AMM一定也不感兴趣。我要彻底禁用AMM和ASMM,在11g中是否只要设置sga_target和memory_target参数为零就ok了?”

Answer:
在11.2.0.1之前的版本(包括11.1.0.7,11.1.0.6等)我们仅需要设置sga_target和memory_target参数为零就可以避免sga/pga内存区域的resize,但在Oracle 11g Release 2中引入了immediate memory allocation requests立即内存分配的特性,该特性会在automatic memory management被禁用的情况下始终生效,引入这一特性的直接目的是尽可能的避免ORA-04031错误的发生,一般来说该特性更多地表现出了其积极的一面,但如果你实际需要的是一块安宁的、如死水寂静的sga/pga内存区域的话,我们还是有办法禁用该immediate memory allocation requests特性的,这样可以彻底杜绝sga中内存组建resize的发生:

SQL> col DESCRIB for a60
SQL> set linesize 200 pagesize 2000
SQL> col name for a32
SQL> col value for a10

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_memory_imm_mode_without_autosga%';

NAME                             VALUE      DESCRIB
-------------------------------- ---------- ------------------------------------------------------------
_memory_imm_mode_without_autosga TRUE       Allow immediate mode without sga/memory target

SQL> alter system set "_memory_imm_mode_without_autosga"=FALSE scope=both;
System altered.

/* 修改该参数需要重启实例 !*/

Question3:”我是一个激进派,我喜欢Oracle self-tuned自动调优的种种特性,为了充分利用AMM的优势,我决定仅设置memory_target参数,而不设置sga_target和pga_aggregate_target参数,让Oracle自行去调优这2个参数。但是我有担心如果pga扩展过头,是不是可能造成sga频繁shrink,造成系统颠簸甚至实例hang住?”

Answer:你的担心是完全有理由的,已知的文档604080.1指出了在11.1.0.6到11.1.0.7版本中存在AMM下未设置PGA_AGGREGATE_TARGET导致Oracle所使用的总内存超过MEMORY_MAX_TARGET的bug 6346293(When not setting PGA_AGGREGATE_TARGET to an explicit value, MEMORY_MAX_TARGET can be exceeded)。该Bug在11.2以上版本中得到了修复。

显然通过R1多个版本的历练后,Oracle充分认识到了AMM”自由调度”所可能带来的问题,我相信Oracle已经得到了某些内存阀值的黄金比例(Another magic number),当然这些幻数隐藏得很深(可能由某个_memory开头的隐藏参数控制着,但我没有兴致去研究,这需要花费太多的时间)。通过这些经典阀值Oracle可以有效控制pga/sga在不过度扩展或收缩的情况下仍保留其灵活性。

针对那些和我一样热爱新特性的朋友,最好的建议可能是在设置memory_target的同时为sga_target、pga_aggregate_target设置最小的期望值:

Memory_Max_Target>=Memory_Target>>(Sga_Target+Pga_Aggregate_Target)

这样可以避免由AMM过于活跃所造成的系统颠簸,而又不失去其灵活性。

to be continued………….

Oracle内部错误:ORA-00600[kgskdecrstat1]一例

一套Itanium HP-UX上的9.2.0.5系统最近出现了ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [],内部错误,其错误日志如下:

Mon Apr 18 12:32:20 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ARC0: Completed archiving  log 6 thread 1 sequence 831803
Mon Apr 18 12:32:22 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
Mon Apr 18 12:32:23 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []

Trace file
----------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name:	HP-UX
Release:	B.11.23
Version:	U
Machine:	ia64
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 14827, image: oracle@(TNS V1-V3)

*** 2011-04-18 02:39:15.043
*** SESSION ID:(276.16183) 2011-04-18 02:39:15.042
DEADLOCK DETECTED
Current SQL statement for this session:
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                      ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00030022-0070d56f        28     276     X            344     259           X
TX-000a0012-0058e4fc       344     259     X             28     276           X
session 276: DID 0001-001C-0000002C	session 259: DID 0001-0158-000020E6
session 259: DID 0001-0158-000020E6	session 276: DID 0001-001C-0000002C
Rows waited on:
Session 259: obj - rowid = 00001F0D - AABNCDAAEAAAAXYAAL
 (dictionary objn - 7949, file - 4, block - 1496, slot - 11)
Session 276: obj - rowid = 00001F0B - AABNCAAAEAAAAXCAAY
 (dictionary objn - 7947, file - 4, block - 1474, slot - 24)
Information on the OTHER waiting sessions:
Session 259:
 pid=344 serial=17280 audsid=106170227 user: 41
           program: JDBC Thin Client
 application name: JDBC Thin Client, hash value=0
 Current SQL Statement:
End of information on OTHER waiting sessions.
*** 2011-04-18 12:32:20.392
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
----- Call Stack Trace -----
... kgskdecrstat  kskdecrstat  ktudecrustat  ktcdso  ktcrcm  ktdcmt  k2lcom ...
----- End of Call Stack Trace -----

PROCESS STATE
-------------
   program: JDBC Thin Client
   application name: JDBC Thin Client, hash value=0
   last wait for 'SQL*Net message from client' blocking sess=0x0 seq=39332 wait_time=21242
               driver id=74637000, #bytes=1, =0

可以看到以上引发ORA-00600[kgskdecrstat1]内部错误的进程同时发现了死锁(DEADLOCK),在MOS上搜索可以发现”Bug 2894072: ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [KGSKDECRSTAT1]”,但是该Bug最早发生在9.2.0.3上,已经确定影响9.2.0.x所有版本,并且在9.2.0.5上没有backport的bug fix。

提交sr后,Oracle Gcs给出了解决的solution:
1.升级数据库到10.2.0.5,11.1.0.7,11.2.0.2等目前支持的版本
2.解决引发该bug的ORA-00060 dead lock问题
3.如果在解决ORA-00060后仍出现以上ORA-00600[kgskdecrstat1]内部错误,且启用了9i早期版本中的resource manager的话,可以尝试禁用该特性:
ALTER SYSTEM SET resource_manager_plan=” SCOPE=BOTH;
以便绕过bug 2494790。

ORA-20001错误一例

一套11.1.0.7上的Oracle Application Object Library应用程序,在收集schema统计信息时出现了ORA-20001错误,具体错误日志如下:

SQL> exec fnd_stats.gather_schema_statistics('AP');

PL/SQL procedure successfully completed.

SQL> show error
No errors.
============================================
Concurrent request error Log
------------------------------------
**Starts**14-APR-2011 02:20:53
**Ends**14-APR-2011 04:40:43
ORA-0000: normal, successful completion
+---------------------------------------------------------------------------+
Start of log messages from FND_FILE
+---------------------------------------------------------------------------+
In GATHER_SCHEMA_STATS , schema_name= ALL percent= 10 degree = 8 internal_flag= NOBACKUP
stats on table AQ$_WF_CONTROL_P is locked 
stats on table FND_CP_GSM_IPC_AQTBL is locked 
stats on table WF_NOTIFICATION_OUT is locked 
Error #1: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JE_FR_DAS_010***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
Error #2: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JE_FR_DAS_010_NEW***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
Error #3: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JG_ZZ_SYS_FORMATS_ALL_B***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
+---------------------------------------------------------------------------+
End of log messages from FND_FILE
+---------------------------------------------------------------------------+

经确认该问题由bug 7601966: GATHER SCHEMA STATS ON AP SCHEMA FAILS WITH ORA-20001: INVALID COLUMN NAME 引起,可以通过follow文档<Gather Schema Statistics fails with Ora-20001 errors after 11G database upgrade [ID 781813.1]>解决该问题。

Oracle Patch Set Update and Critical Patch Update April 2011 Released

2011 April的Oracle Patch set Update与Critical Patch Update发布了,本次发布包括了对Oracle Database Server, Oracle Fusion Middleware, Oracle Enterprise Manager Grid Control, Oracle E-Business Suite and Supply Chain Products Suite, Oracle PeopleSoft Enterprise, and Oracle JDEdwards EntepriseOne, Oracle Siebel CRM, Oracle Industry Applications, Oracle Sun products suite, and Oracle OpenOffice Suite等多个产品的补丁更新。

其中与数据库相关的主要有补丁集更新和紧急补丁更新如下:

  1. Database 11.2.0.2上的CPU Patch 11724984, or PSU Patch 11724916
  2. Database 11.2.0.1上的CPU Patch 11724991, or PSU Patch 11724930
  3. Database 11.1.0.7上的CPU Patch 11724999, or PSU Patch 11724936
  4. Database 10.2.0.5上的CPU Patch 11725006, or PSU Patch 11724962
  5. Database 10.2.0.4上的CPU Patch 11725015, or PSU Patch 11724977

ORA-07445:[SIGFPE] [Integer divide by zero]内部错误一例

一套SUNOS 5.10上的单节点10.2.0.3系统出现了ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] []内部错误,具体trace日志如下:

mon_ora_17633.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /oracle/oracle/product/10.2.0
System name: SunOS
Node name: monitor-a
Release: 5.10
Version: Generic_139556-08
Machine: i86pc

ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] []
Current SQL statement for this session:
select req_time into :b0 from t_FIX_TranSerial where
((((tran_bank=:b1 and tran_type=:b2) and term_no=:b3)
and trace_no=:b4) and local_time='0')
----- Call Stack Trace -----

sigsetjmp <- call_user_handler
<- sigacthandler <- kpopfr <- kposdi <- kpopsdi <- opiefn0
<- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino
<- opiodr <- opidrv <- sou2o <- opimai_real <- main
<- 0000000000E54FE7

PROCESS STATE
-------------
O/S info: user: monitor, term: pts/5, ospid: 17621, machine: monitor-a
program: CTPDATA@monitor-a (TNS V1-V3)
application name: CTPDATA@monitor-a (TNS V1-V3), hash value=0
last wait for 'SQL*Net message to client' blocking sess=0x0 seq=6213 wait_time=1 seconds since wait started=0
driver id=62657100, #bytes=1, =0

通过在MOS上查询以上ORA-07445错误的arguement可以发现Note <ORA-7445 [KPOPFR] [SIGFPE] [INTEGER DIVIDE BY ZERO] When Repeatedly Executing a Query (Doc ID 421203.1)> :

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms

1. Repeatedly executing a query can lead to the following error: 

ORA-7445 [kpopfr] [SIGFPE] [INTEGER DIVIDE BY ZERO]

2. The call stack from the ORA-07445 trace file should contain the following functions:
kposdi  kpopsdi

The error is caused by BUG 5753629.
Abstract: QUERY FAILS BY ORA-7445 [KPOPFR]
Repeatedly executing a query can lead to an ORA-7445[kpopfr] error.

Solution
To implement the solution, do one of the following: 

1. Upgrade to 11.1 or 10.2.0.4, when available.
At the time of writing the article these version were not yet available. (July 2007).
2. Apply one-off Patch 5753629 from MetaLink, if available for your platform and version.

There is no known workaround available for this bug.

References

BUG:5753629 - QUERY FAILS BY ORA-7445 [KPOPFR].

Hdr: 5753629 10.2.0.2 RDBMS 10.2.0.2 PRG INTERFACE PRODID-5 PORTID-23 ORA-7445
Abstract: QUERY FAILS BY ORA-7445 [KPOPFR].

*** 01/09/07 06:12 pm ***
TAR:
----

PROBLEM:
--------
When executing query again and again from one session, query fails
by ORA-7445[kpopfr].

  ====================================================================
  Sat Dec 30 00:22:39 2006
  Errors in file /var/log/oracle/trace/felica2_ora_6156.trc:
  ORA-7445: exception encountered: core dump [kpopfr()+536] [SIGFPE]
  [Integer divide by zero] [0x1023BCE18] [] []
  ====================================================================

DIAGNOSTIC ANALYSIS:
--------------------
From disassemble, %o3 is devided by %o0 and %o0 seems to be 0x0.

  0x1023f43d0 :       umul  %g5, %g4, %o0
  0x1023f43d4 :       mov  %g0, %y
  0x1023f43d8 :       udiv  %o3, %o0, %o3

%o0 is calcurated by %g5 X %g4 at kpopfr+528.

From our trace file, this value(%g5) is 0x100000.

  ub4 kponc_p [FFFFFFFF7B22AAAC, FFFFFFFF7B22AAB0) = 00100000

And %g4 seems to be 0x1000 from below trace file output.
 ========== FRAME [6] (kpopsdi()+148 -> kposdi()) ==========
  %l0 0000000100302B00 %l1 0000000000000002 %l2 FFFFFFFF7B263CA8
  %l3 00000003C36AFB80 %l4 0000000105D7CE20 %l5 0000000000000001
  %l6 0000000000000007 %l7 0000000105D7A920 %i0 0000000000000000
  %i1 FFFFFFFF7FFFB9EC %i2 0000000000001000 %i3 0000000105E03220
                       ~~~~~~~~~~~~~~~~~~~~ <--(*) here
  %i4 0000000000800000 %i5 0000000000105C00 %fp FFFFFFFF7FFFB241 

If %g4=0x1000 and %g5=0x100000, %g4 X %g5 = 0x100000000.
0x100000000 is 0x0 as ub4, and this may bring 0 divide and ORA-7445.

I can reproduce the similar problem in my house, so I'll upload testcase.
Problem reproduce at the following case.

 * sum of all column size is 1048576(0x100000)
 * run query again and again from one session (about 4096(0x1000) times)

From this results, above guess seems to be correct.

WORKAROUND:
-----------
n/a

RELATED BUGS:
-------------
n/a

REPRODUCIBILITY:
----------------
I have confirmed that this problem reproduces at the below env.

 * Linux x86 32bit, 10.2.0.3 : ORA-7445[kpopfr()+300]
 * Linux x86 64bit, 10.2.0.2 : ORA-7445[kpopfr()+339]
 * Solaris 64bit, 10.2.0.2   : ORA-7445[kpopfr()+536]
 * HP-UX Itanium, 10.2.0.2   : ORA-7445[_div32U()+34]

TEST CASE:
----------
At first, creating table like follows.

conn scott/tiger
drop table test;
create table test
( c000 char(2000),
  c001 char(2000),
    ... 
  c523 char(2000),
  c524 char(576));

  --> sum of all column size is 1048576(0x100000).

Run next shell script.

  while [ 1 ]
  do
  echo "set feedback off"
  echo "select * from test where c001 = 'A';"
  done | sqlplus -s scott/tiger

It takes 3-10 minutes to reproduce the problem.
Required time for reproducing depends on hardware spec.

STACK TRACE:
------------
 ksedmp ssexhd sighndlr call_user_handler kposdi kpopsdi kpoal8
 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real
 main start

具体向Oracle GCS提交SR以后确认为 BUG 5753629. Oracle GCS给出了2种解决方案:
1.升级到10.2.0.4或更高版本
2.应用Apply one-off Patch 5753629

fast incremental backup failed on standby database

一套Linux上的11.1.0.7的physical standby物理备库在使用fast incremental backup进行高于0级的增量备份时会出现ORA-19648错误,其出错记录如下:

RMAN>  backup incremental level 1 database;

Starting backup at 22-MAR-11
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/standby/oradata/SBDB2/datafile/o1_mf_sysaux_22m6ov92_.dbf
input datafile file number=00003 name=/standby/oradata/SBDB2/datafile/o1_mf_undotbs1_23m6ovap_.dbf
input datafile file number=00006 name=/standby/oradata/SBDB2/datafile/o1_mf_enc_25m6ovba_.dbf
channel ORA_DISK_1: starting piece 1 at 22-MAR-11
channel ORA_DISK_2: starting incremental level 1 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00001 name=/standby/oradata/SBDB2/datafile/o1_mf_system_21m6ov92_.dbf
input datafile file number=00005 name=/standby/oradata/SBDB2/datafile/o1_mf_example_24m6ovar_.dbf
input datafile file number=00004 name=/standby/oradata/SBDB2/datafile/o1_mf_users_26m6ovba_.dbf
channel ORA_DISK_2: starting piece 1 at 22-MAR-11
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/22/2011 20:20:28
ORA-19648: datafile 2: incremental-start SCN equals checkpoint SCN
ORA-19640: datafile checkpoint is SCN 1249353 time 03/09/2011 05:50:27
continuing other job steps, job failed will not be re-run
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_2 channel at 03/22/2011 20:22:33
ORA-19648: datafile 1: incremental-start SCN equals checkpoint SCN
ORA-19640: datafile checkpoint is SCN 1249352 time 03/09/2011 05:50:26

经过分析该ORA-19640->ORA-19648是由11.1.0.7上的Bug引起的,MOS已经确认其为Bug 9288598:

Affects:
Product (Component)	 Oracle Server (Rdbms)
Range of versions believed to be affected	 Versions BELOW 12.1
Versions confirmed as being affected	
11.1.0.7
Platforms affected	 Generic (all / most platforms affected)
Fixed:

This issue is fixed in	
12.1 (Future Release)
11.2.0.2 (Server Patch Set)

An RMAN Backup incremental level 1 can fail with ORA-19648 / ORA-19640 if the standby database is open read only rather than just skipping the datafile.
PROBLEM:
--------
Following error occurring during incremental backup:

Starting backup at 02-DEC-2009 20:37:44
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=154 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: Veritas NetBackup for Oracle - Release 6.5 
(2008052301)
channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set
channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set
input datafile file number=00002 
name=/data/ora_data01/KAHCB4P/KAHCB4P_sysaux_01.dbf
input datafile file number=00001 
name=/data/ora_data01/KAHCB4P/KAHCB4P_system_01.dbf
input datafile file number=00003 
name=/data/ora_data01/KAHCB4P/KAHCB4P_undo_01.dbf
input datafile file number=00004 
name=/data/ora_data01/KAHCB4P/KAHCB4P_users_01.dbf
input datafile file number=00005 
name=/data/ora_data01/KAHCB4P/KAHCB4P_tools_01.dbf
input datafile file number=00006 
name=/data/ora_data01/KAHCB4P/KAHCB4P_backup_test_01.dbf
channel ORA_SBT_TAPE_1: starting piece 1 at 02-DEC-2009 20:38:01
RMAN-3009: failure of backup command on ORA_SBT_TAPE_1 channel at 12/02/2009 
20:38:02
ORA-19648: datafile 2: incremental-start SCN equals checkpoint SCN
ORA-19640: datafile checkpoint is SCN 3911695 time 11/27/2009 12:36:48
continuing other job steps, job failed will not be re-run
channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set
channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set

DIAGNOSTIC ANALYSIS:
--------------------
Same problem as Bug 6903819
Requested backport, but this hasn't fixed the problem.
Confirmed that the patch was installed correctly and relinked successfully.

RELEASE NOTES:
]]Backup incremental leve 1 fails with ORA-19648 and ORA-19640 when standby
]]database opened for read only.
*** 02/14/10 02:43 pm *** (ADD: Impact/Symptom->FEATURE UNUSABLE )

REDISCOVERY INFORMATION:
Backup incremental level 1 fails with ORA-19648 and the standby database is
open read-only.
WORKAROUND:
None

可以从上述Note中看到该Bug需要到11.2.0.2中才得到fix,那么在11.1.0.7上我们有什么办法能解决或者绕过该问题吗?
经过测试,我得到2种workaround的方法:
该ORA-19648错误首先可以通过在物理备库(physical standby)上停止介质恢复(MRP)进程的方式来workaround:

SQL>  alter database recover managed standby database cancel;
Database altered.

RMAN> backup incremental level 1 database;

Starting backup at 22-MAR-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=141 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=140 device type=DISK
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/standby/oradata/SBDB2/datafile/o1_mf_sysaux_22m6ov92_.dbf
input datafile file number=00004 name=/standby/oradata/SBDB2/datafile/o1_mf_users_26m6ovba_.dbf
input datafile file number=00006 name=/standby/oradata/SBDB2/datafile/o1_mf_enc_25m6ovba_.dbf
channel ORA_DISK_1: starting piece 1 at 22-MAR-11
channel ORA_DISK_2: starting incremental level 1 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00001 name=/standby/oradata/SBDB2/datafile/o1_mf_system_21m6ov92_.dbf
input datafile file number=00003 name=/standby/oradata/SBDB2/datafile/o1_mf_undotbs1_23m6ovap_.dbf
input datafile file number=00005 name=/standby/oradata/SBDB2/datafile/o1_mf_example_24m6ovar_.dbf
channel ORA_DISK_2: starting piece 1 at 22-MAR-11
channel ORA_DISK_1: finished piece 1 at 22-MAR-11
piece handle=/standby/backup/2km7suui_1_1.bak tag=TAG20110322T212538 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:33
channel ORA_DISK_2: finished piece 1 at 22-MAR-11
piece handle=/standby/backup/2lm7suv1_1_1.bak tag=TAG20110322T212538 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:27
Finished backup at 22-MAR-11

Starting Control File and SPFILE Autobackup at 22-MAR-11
piece handle=/standby/backup/c-157018592-20110322-01.bak comment=NONE
Finished Control File and SPFILE Autobackup at 22-MAR-11

该ORA-19648错误也可以通过禁用fast incremental backup的块跟踪(block change tracking)特性来绕过问题,当然在可能的情况下我们更推荐使用上面那种方法,因为毕竟还可以利用到fast incremental backup特性:

SQL> alter database recover managed standby database using current logfile disconnect;
Database altered.

SQL> select process,status from v$managed_standby;

PROCESS   STATUS
--------- ------------
ARCH	  CONNECTED
ARCH	  CONNECTED
ARCH	  CONNECTED
ARCH	  CONNECTED
MRP0	  APPLYING_LOG

SQL> select status from v$block_change_tracking;

STATUS
----------
ENABLED


SQL> alter database disable block change tracking;
Database altered.


RMAN>  backup incremental level 2 database;

Starting backup at 22-MAR-11
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting incremental level 2 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/standby/oradata/SBDB2/datafile/o1_mf_sysaux_22m6ov92_.dbf
input datafile file number=00004 name=/standby/oradata/SBDB2/datafile/o1_mf_users_26m6ovba_.dbf
input datafile file number=00006 name=/standby/oradata/SBDB2/datafile/o1_mf_enc_25m6ovba_.dbf
channel ORA_DISK_1: starting piece 1 at 22-MAR-11
channel ORA_DISK_2: starting incremental level 2 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00001 name=/standby/oradata/SBDB2/datafile/o1_mf_system_21m6ov92_.dbf
input datafile file number=00003 name=/standby/oradata/SBDB2/datafile/o1_mf_undotbs1_23m6ovap_.dbf
input datafile file number=00005 name=/standby/oradata/SBDB2/datafile/o1_mf_example_24m6ovar_.dbf
channel ORA_DISK_2: starting piece 1 at 22-MAR-11
channel ORA_DISK_1: finished piece 1 at 22-MAR-11
piece handle=/standby/backup/2nm7sv8d_1_1.bak tag=TAG20110322T213052 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
channel ORA_DISK_2: finished piece 1 at 22-MAR-11
piece handle=/standby/backup/2om7sv8s_1_1.bak tag=TAG20110322T213052 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:15
Finished backup at 22-MAR-11

Starting Control File and SPFILE Autobackup at 22-MAR-11
piece handle=/standby/backup/c-157018592-20110322-02.bak comment=NONE
Finished Control File and SPFILE Autobackup at 22-MAR-11

解决Oracle错误ORA-15061一例

一套Linux上的11.2.0.1系统,告警日志中出现以下错误:

ORA-00202: control file: '+DATA/controlfile/current.256.7446483424'
ORA-17505: ksfdrsz:1 Failed to resize file to size 612 blocks
ORA-15061: ASM operation not supported [41]
WARNING: Oracle Managed File +FRA in the recovery area is orphaned by the control file.
The control file can not keep all recovery area files due to space limitation.
krse.c
Archived Log entry 200 added for thread 1 sequence 200 ID 0x3739c2f0 dest 1:

RMAN backup failed with

Rman backup error:
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 
ORA-19510: failed to set size of 6800 blocks for file "+FRA" (block size=8192)
ORA-17505: ksfdrsz:1 Failed to resize file to size 6800 blocks
ORA-15061: ASM operation not supported [41]

[oracle@rh2 ~]$ oerr ora 15061
15061, 00000, "ASM operation not supported [%s]"
// *Cause:  An ASM operation was attempted that is invalid or not supported
//          by this version of the ASM instance.
// *Action: This is an internal error code that is used for maintaining
//          compatibility between software versions and should never be
//          visible to the user; contact Oracle support Services.
//

提交SR后,根据Oracle GCS确认为BUG:9788316:

1.The following error indicates that it failed to resize the controlfile to 612 blocks. If the DB_BLOCK_SIZE is 8192, 
then 612 blocks is not more than 5MB. According to the results of the query on V$ASM_DISKGROUP in 'results01.txt' file, 
the ASM diskgroup +DATA has 108605 MB free space. So, the ASM diskgroup +DATA has enough space for the 612 blocks.

2. By the way, please confirm whether you have recently applied PSU #1. Anyway, 
please try to relink the Oracle executables, as shown here. Before you run the "relink" command, 
make sure to shutdown both the ASM instance and target database.

$ORACLE_HOME/bin/relink all

3 After relinking the Oracle executables, please confirm whether you are still 
experiencing the same ORA-15061 error.

ORA-15061: ASM Operation Not Supported [41] After Apply PSU #1 (Doc ID 1126113.1)
ORA-15061 reported while doing a file operation with 11.1 or 11.2 ASM after PSU applied in database home (Doc ID 1070880.1)

Hdr: 9788316 11.2.0.1.1 RDBMS 11.2.0.1.0 ASM PRODID-5 PORTID-267
Abstract: AFTER APPLY PSU 1 (11.2.0.1.1) ON RDBMS HOME UNABLE TO RESIZE ASM DATAFILE.

*** 06/07/10 02:09 pm ***
—-
3-1827355081

PROBLEM:
——–
1) After apply PSU 1 (11.1.0.2.1) on the 11.2.0.1.0 RDBMS Oracle Home
customer is unable to resize an ASM datafile:
============================================================
SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

============================================================

2) DB alertlog reports (alert_brg13ed1.log):
============================================================

Mon Jun 07 11:27:57 2010
Stopping background process CJQ0
Mon Jun 07 12:30:28 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 536870915
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 536870915…
Mon Jun 07 13:07:40 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560m
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560m…
Mon Jun 07 13:49:50 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 800M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 800M…
Mon Jun 07 13:58:51 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M…
Mon Jun 07 14:25:43 2010

============================================================

3) ASM alert.log does not report any problem.

DIAGNOSTIC ANALYSIS:
——————–
(see below)

WORKAROUND:
———–
None

RELATED BUGS:
————-

REPRODUCIBILITY:
—————-

TEST CASE:
———-

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

*** 06/07/10 02:09 pm ***
4) I can confirm that at (Jun 07 10:55:59 EDT 2010) customer installed the
Patch: 9355126 (on the Grid OH), which includes the fix for Bug: 8898852:
============================================================

Invoking OPatch 11.2.0.1.2

Oracle Interim Patch Installer version 11.2.0.1.2

Oracle Home : /u01/grid/oracle/product/grid
Central Inventory : /u01/app/oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 11.2.0.1.2
OUI version : 11.2.0.1.0
OUI location : /u01/grid/oracle/product/grid/oui
——————————————————————————

Installed Top-level Products (1):

Oracle Grid Infrastructure
11.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (87):

There are 87 products installed in this Oracle Home.

Interim patches (1) :

Patch 9355126 : applied on Mon Jun 07 10:55:59 EDT 2010
Unique Patch ID: 12175902
Created on 5 Feb 2010, 07:38:30 hrs PST8PDT
Bugs fixed:
8974548, 8898852
============================================================

5) Also, I can confirm that at (Mon May 03 00:01:14 EDT 2010) customer
installed the 11.2.0.1.1 Patch Set Update (PSU #1) which include as well the
patch for Bug: 8898852 (on the RDBMS OH):
============================================================
Invoking OPatch 11.2.0.1.2

Oracle Interim Patch Installer version 11.2.0.1.2

Oracle Home : /u01/app/oracle/product/11.2.0/db1
Central Inventory : /u01/app/oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 11.2.0.1.2
OUI version : 11.2.0.1.0
OUI location : /u01/app/oracle/product/11.2.0/db1//oui

——————————————————————————

Installed Top-level Products (1):

Oracle Database 11g
11.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (134):

. 11.2.0.1.0
There are 134 products installed in this Oracle Home.

Interim patches (1) :

Patch 9352237 : applied on Mon May 03 00:01:14 EDT 2010
Unique Patch ID: 12366369
Created on 6 Apr 2010, 05:03:41 hrs PST8PDT
Bugs fixed:
8661168, 8769239, 8898852, 8801119, 9054253, 8706590, 8725286, 8974548
8778277, 8780372, 8769569, 9027691, 9454036, 9454037, 9454038, 8761974
7705591, 8496830, 8702892, 8639114, 8723477, 8729793, 8919682, 8818983
9001453, 8475069, 9328668, 8891929, 8798317, 8820324, 8733749, 8702535
8565708, 9036013, 8735201, 8684517, 8870559, 8773383, 8933870, 8812705
8405205, 8822365, 8813366, 8761260, 8790767, 8795418, 8913269, 8897784
8760714, 8717461, 8671349, 8775569, 8898589, 8861700, 8607693, 8642202
8780281, 9369797, 8780711, 8784929, 8834636, 9015983, 8891037, 8828328
8570322, 8832205, 8665189, 8717031, 8685253, 8718952, 8799099, 8633358
9032717, 9321701, 8588519, 8783738, 8796511, 8782971, 8756598, 9454385
8856497, 8703064, 9066116, 9007102, 8721315, 8818175, 8674263, 9352237
8753903, 8720447, 9057443, 8790561, 8733225, 9197917, 8928276, 8991997,
8837736
============================================================

6) But the problem persists:
============================================================
SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

============================================================

7) 11.2.0.1.1 Patch Set Update (PSU #1) should include the fix for bug:
8898852 (ORA-15061: ASM operation not supported [41]).
This PSU contains a fix that adds an operation to the ASM protocol.
The problem is that the PSU has been applied correctly to the RDBMS
home, but the fix is not correctly applied to the ASM HOME.

So either install that patch to the ASM HOME, or if that has been done
check why it failed.

A common reason for failure is a permission problem. Check bug 9711074
and its duplicates for more background. And check documentation bug 8629483
for the solution.

How do I know the fix is not missing? This error message is the key:
ORA-15061: ASM operation not supported [41]
(especially with operation 41)

If the patch was applied to the grid home, then probably the relink failed
due to permission problems.

ORA-15061: ASM Operation Not Supported [41] After Apply PSU #1 [ID 1126113.1]

pplies to:
Oracle Server – Enterprise Edition – Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
1) After apply PSU 1 (11.2.0.1.1 ) on the Grid Infrastructure Oracle Home
customer is unable to resize an ASM datafile:

SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

2) DB alertlog reports (alert_brg13ed1.log):

Mon Jun 07 11:27:57 2010
Stopping background process CJQ0
Mon Jun 07 12:30:28 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 536870915
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 536870915…
Mon Jun 07 13:07:40 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560m
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560m…
Mon Jun 07 13:49:50 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 800M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 800M…
Mon Jun 07 13:58:51 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M…
Mon Jun 07 14:25:43 2010

3) ASM alert.log does not report any problem.

Changes
PSU 1 (11.2.0.1.1 ) patchset was installed on the Grid Infrastructure Oracle Home
Cause

The PSU #1 (11.2.0.1.1 ) patch on the Grid Infrastructure Oracle Home was not correctly linked by opatch.

The PSU #1 (11.2.0.1.1 ) patch on the Grid Infrastructure Oracle Home needs to be relinked.
Solution
Relink the PSU #1 patch on the Grid Infrastructure Oracle Home as follow:

1) Shutdown the database instances.

2) Shutdown the ASM instance.

3) Relink the Grid Infrastructure Oracle Home as follow:

script /tmp/relink_GI.txt

env | sort

$ORACLE_HOME/bin/relink all

exit

4) Startup the ASM instance.

5) Startup the databases.

6) Resize the datafile:

SQL> alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;

该bug一般是由于对grid infrastructure实施了PSU 1 (11.2.0.1.1 )后,Oracle binary没有正确link导致的,可行的解决方案是relink all,但这需要重启instance!

Google Nexus S重启bug被官方确认

Google员工已经官方确认了这个意外重启bug,并宣布其会主动协同Samsung解决这一麻烦的根源;就最近的修复申明来看他们似乎已经找到了问题的root cause,可以预期这个reboot bug可以在短期内得到修复。据分析这次的问题出在制造商身上,而非由Android 2.3姜饼操作系统引起。至少现在已购买了Nexus S(譬如我)的用户可以松一口气了,修复补丁已经在路上了!

对于那些还没有购买Nexus S且目前仍跃跃欲试的朋友,等待一段时间可以说是最好的选择;或者你希望了解更多更详实的关于Nexus S的消息,那么G4Games会是一个最佳的信息来源!

Oracle RAC内部错误:ORA-00600[keltnfy-ldmInit]一例

一套SUNOS上的2节点10.2.0.2 RAC系统日前出现ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []内部错误,错误发生时系统操作人员误使用hostname命令修改了1号主机的主机名,之后陆续出现以上ora-00600错误,同时操作系统日志显示RAC CSS进程意外终止,具体日志如下:

================== OS Message=====================
Jan 10 11:15:10 cupd25k-a root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Duplicate Oracle CLSMON found. Killing and restarting it.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CLSMON terminated with unexpected status 137. Respawning

/* 这里的Duplicate Oracle CLSMON found 因该指的是OCLSMON进程,
"In Oracle 10.2.0.2 and above there is an additional process called OCLSOMON
which monitors the CSS daemon for hangs or scheduling issues and can reboot a
node if there is a perceived hang. OCLSOMON is spawned in init.cssd and runs
as the Oracle user."
   oclsmon进程在10.2.0.2以后版本被引入,用以监视css进程,
   若发生hang或操作系统调度问题时该进程可能会reboot节点,
   oclsmon进程会被init.cssd脚本spawned.  */

==================oclsmon.log======================
2011-01-10 11:15:11.376
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.479
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:11.737
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.751
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.006
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.023
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.278
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.293
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1

/*  skgxn是Oracle Clusterware用以监视skgxn事件(即第三方CLUSTERWARE相关的事宜,他们应该有用sun的cluster);
    似乎是修改hostname导致了Oracle CSS出现了fatal error,并启动了一个以上的OCLSMON进程(Duplicate Oracle CLSMON found),
    最后"Oracle CSS daemon failed to start up. Check CRS logs for diagnostics",
    在Oracle instance启动的情况下25k-a节点的CSS进程意外终止,
    可能导致该节点上的所有实例的LMD(global Enqueue Service daemon)、LMON无法正常工作而导致实例hang住。*/

==========================alert.log====================
Errors in file /oracle/oracle/admin/BOCPCS/udump/bocpcs1_ora_12320.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []

=========================part of trace file===============
*** 2011-01-10 11:11:02.957
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFF9D40 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFF8EE8 ?
                                                   FFFFFFFF7FFFA640 ?
                                                   000000008 ?
kgerinv()+200        PTR_CALL 0000000000000000     000000002 ? 10638A1CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A1CC ?
kgeasnmierr()+28     CALL     kgerinv()            106384B98 ? 000000000 ?
                                                   105D3B940 ? 000000002 ?
                                                   FFFFFFFF7FFFDFF0 ?
                                                   000001430 ?
keltnfy()+784        CALL     kgeasnmierr()        106384B98 ? 1064DCBF0 ?
                                                   105D3B940 ? 000000002 ?
                                                   000000000 ? 00000002E ?
kscnfy()+552         PTR_CALL 0000000000000000     10639B498 ? 38001E7A8 ?
                                                   1055AC5D0 ? 10639B498 ?
                                                   000102C00 ? 10638A1C0 ?
ksucrp()+2436        CALL     kscnfy()             000008000 ? 000808214 ?
                                                   100C4C220 ? 1055C6680 ?
                                                   00000000F ? 000000001 ?
opiino()+2056        CALL     ksucrp()             000106387 ? 380007608 ?
                                                   000000000 ? 000380000 ?
                                                   000106000 ? 106387618 ?
opiodr()+1488        PTR_CALL 0000000000000000     10555A000 ?
                                                   FFFFFFFF7FFFF1C8 ?
                                                   00010555A ? 000106000 ?
                                                   105C83000 ? 000000001 ?
opidrv()+828         CALL     opiodr()             106391000 ? 000000000 ?
                                                   106390DD8 ? 106390000 ?
                                                   106391BD0 ? 000106000 ?
sou2o()+80           CALL     opidrv()             106394358 ? 000000001 ?
                                                   00000003C ? 000000000 ?
                                                   00000003C ? 000106000 ?
opimai_real()+124    CALL     sou2o()              FFFFFFFF7FFFF788 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF7B0 ?
                                                   105C82000 ? 000105C82 ?
main()+152           CALL     opimai_real()        000000002 ?
                                                   FFFFFFFF7FFFF888 ?
                                                   103F1BBCC ? 10632DB10 ?
                                                   002411E44 ? 000014400 ?
_start()+380         CALL     main()               000000002 ? 000000008 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF898 ?
                                                   FFFFFFFF7FFFF9A8 ?
                                                   FFFFFFFF7C700200 ?

/* 可以看到以上trace文件指出了no session,
    在服务进程启动阶段遭遇了该keltnfy-ldmInit内部错误*/

metalink文档Startup Database Produces Ora-00600: [Keltnfy-Ldminit] [ID 336447.1]
介绍了该内部错误一般由主机上的不当网络配置引起,很显然使用hostname命令修改了一个无法解析的
主机名时可能引发该ORA-00600[keltnfy-ldmInit]内部错误。

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3 - Release: 10.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 09-Jun-2010***
Symptoms

An startup nomount on Oracle 10g Release 2 database produces the following exception in alert log

Starting up ORACLE RDBMS Version: 10.2.0.1.0.
Errors in file /opt/oracle/10.2/admin/ORCL/udump/ORCL_ora_535.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
USER: terminating instance due to error 600
Instance terminated by USER, pid = 535
Cause

The problem is related to getting host information.
In this case, ldmInit()/sldmInit() is failing with error 46 : LDMERR_HOST_NOT_FOUND

The following exception may also occur :

LDMERR_SOSD_INIT         OSD init failed to be specific in these OSD failures
 LDMERR_BAD_ADDR         bad address when system call gethostname failed
 LDMERR_HOST_NOT_FOUND   gethostbyname system call fails
 LDMERR_NO_SUPPORT       when specific address type is not supported

Development has fixed two bugs so far regarding this issue

Bug:5438154 - Abstract: ORA-600[KELTNFY-LDMINIT]  STARTING THE DB
Release Notes:
ldmInit returned LDMERR_HOST_NOT_FOUND for the machine huge alias list/address list
Workaround:
reduce the alais list of the machine

Bug:5486074 - Abstract: ORA-600 [KELTNFY-LDMINIT] WHEN DNS IS NOT AVAILABLE
Release Notes:
Internal error is raised by the Server Generated Alert subsystem when it can not determine Host Name or
Network Address. This can be caused by DNS server being unaavilable. 

Solution

The fix for 5486074 will not fix any underlying error from gethostbyname(), it just change the internal error to a warning message :

 "Warning: keltnfy call to ldmInit failed with error 46"

You will still need to fix the network config issue.  

These are the check you can do verify the host information 

      Check permission on /etc/hosts 

$ ls -l /etc/hosts
-rw-r--r--  2 root root 194 Oct 17  2006 /etc/hosts

      Check if /etc/hosts file is correctly configured

              ( all of this on one line ). 

Check the hostname:
$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname
      Check if /etc/nodename is correctly configured

If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.

$ nslookup
$ nslookup
$ nslookup 

The forward and reverse lookup should succeed and return consistent address/info.  

 Check nsswitch.conf

$ more nsswitch.conf
hosts:      files dns
Make sure host lookup is also done through the /etc/hosts file and not just dns.  It is recommended that FILES come first before DNS.
Also, check the resolv.conf. This makes sure that the DNS is working properly.

显然在生产主机上使用hostname命令是危险的,因为你很难保证你在打字的时候不会因为同事的一下拍击而输错,有人说在生产环境中rm命令因该被禁用,那么这种特殊待遇对hostname命令也适用,我们可以用什么来代替hostname查看主机名呢?选择可以有非常多,这里我推荐一种:

-bash-3.00$ oslevel -r 
5300-07

-bash-3.00$ hostname
askmac.cn

-bash-3.00$ uname -n
askmac.cn

/* uname -n完全可以满足你的需要! */
That's great!

Oracle内部错误:ORA-00600[15801], [1]一例

一套Sparc Solaris上的11.1.0.7系统,在创建索引时频繁出现ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []内部错误,日志信息如下:

Tue Aug 17 17:34:21 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p023_22262.trc (incident=12505):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12505/ORAHCMU_p023_22262_i12505.trc
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p021_22258.trc (incident=12489):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12489/ORAHCMU_p021_22258_i12489.trc

Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p015_9328.trc (incident=19909):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p043_9388.trc (incident=20133):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p087_9668.trc (incident=20485):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc (incident=19885):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_19789/ORAHCMU_ora_8602_i19789.trc
Mon Aug 23 14:43:43 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages

Dump continued from file: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_ora_8602.trc
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []

*** 2010-08-23 14:43:42.974
----- Current SQL Statement for this session (sql_id=00abhfx460qm9) -----
CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG (CAL_ID, GP_PAYGROUP, 
EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) TABLESPACE PSINDEX STORAGE 
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PARALLEL NOLOGGING

----- Call Stack Trace -----
ksedst1 ksedst dbkedDefDump dbgexPhaseII dbgexProcessError dbgePostErrorKGE kgeade kgerem
kxfpProcessError kxfpqidqr kxfpqdqr kxfxgs kxfxcp qerpxSendParse kxfpValidateSlaveGroup kxfpgsg
kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart kdicrws kdicdrv opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o main



SO: 0x3bf0bbf20, type: 4, owner: 0x3bf5452d0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3bf5452d0, name=session, file=ksu.h LINE:10719 ID:, pg=0
(session) sid: 217 ser: 767 trans: 0x3bc0660f8, creator: 0x3bf5452d0
flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x44008) DDLT1/-
DID: , short-term DID:
txn branch: 0x0
oct: 9, prv: 0, sql: 0x3b5d14510, psql: 0x3b6d59820, user: 31/SYSADM
ksuxds FALSE at location: 0
service name: ORAHCMU
client details:
O/S info: user: Administrator, term: UJWALTPVM, ospid: 304:2892
machine: WORKGROUP\UJWALTPVM program: pside.exe
client info: ujwal,Administrator,UJWALTPVM,,pside.exe,
application name: pside.exe, hash value=2824484291
Current Wait Stack:
Not in wait; last wait ended 2.475286 sec ago
Wait State:
auto_close=0 flags=0x21 boundary=0x0/-1
Session Wait History:
0: waited for 'lient'
=c8, =1, =0
wait_id=10483 seq_num=10484 snap_id=1
wait times: snap=0.168502 sec, exc=0.168502 sec, total=0.168502 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000903 sec of elapsed time
1: waited for ' waiting for ruleset'
=10010063, =1, =0
wait_id=10482 seq_num=10483 snap_id=1
wait times: snap=0.008580 sec, exc=0.008580 sec, total=0.008580 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000731 sec of elapsed time
2: waited for ' waiting for ruleset'
=1001004f, =4, =0
wait_id=10481 seq_num=10482 snap_id=1
wait times: snap=0.000132 sec, exc=0.000132 sec, total=0.000132 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000074 sec of elapsed time
3: waited for ' waiting for ruleset'
=1001004f, =3, =0
wait_id=10480 seq_num=10481 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000065 sec of elapsed time

----- Session Cursor Dump -----
Current cursor: 1, pgadep=0

Open cursors(pls, sys, hwm, max): 3(0, 2, 64, 300)
NULL=1 SYNTAX=0 PARSE=0 BOUND=1 FETCH=0 ROW=1
Cached frame pages(total, free):
4k(14, 14), 8k(1, 1), 16k(1, 1), 32k(0, 0)

----- Current Cursor -----


----- Plan Table -----

============
Plan Table
============
----------------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
----------------------------------------------------+-----------------------------------+-------------------------+
| 0 | CREATE INDEX STATEMENT | | | | 2 | | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 82 | 4510 | | |:Q1001| P->S |QC (ORDER) |
| 3 | INDEX BUILD UNIQUE | PS_HM_BEN_GP_STG| | | | |:Q1001| PCWP | |
| 4 | SORT CREATE INDEX | | 82 | 4510 | | |:Q1001| PCWP | |
| 5 | PX RECEIVE | | 82 | 4510 | 2 | 00:00:01 |:Q1001| PCWP | |
| 6 | PX SEND RANGE | :TQ10000 | 82 | 4510 | 2 | 00:00:01 |:Q1000| P->P |RANGE |
| 7 | PX BLOCK ITERATOR | | 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWC | |
| 8 | TABLE ACCESS FULL | PS_HM_BEN_GP_STG| 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWP | |
----------------------------------------------------+-----------------------------------+-------------------------+


----------------------------------------
Cursor#1(0xffffffff7ce31928) state=BOUND curiob=0xffffffff7ce57d28
curflg=4c fl2=0 par=0x0 ses=0x3bf0bbf20
----- Dump Cursor sql_id=00abhfx460qm9 xsc=0xffffffff7ce57d28 cur=0xffffffff7ce31928 -----
Dump Parent Cursor sql_id=00abhfx460qm9 phd=0x3b5d14510 plk=0x3b0bb3318
sqltxt(0x3b5d14510)=CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG 
(CAL_ID, GP_PAYGROUP, EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) 
TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) 
PCTFREE 10 PARALLEL NOLOGGING
hash=616eaa631fc21f4c0029707748605a69
parent=0x3ae539590 maxchild=01 plk=0x3b0bb3318 ppn=n
cursor instantiation=0xffffffff7ce57d28 used=1282545779 exec_id=16777216 exec=1
child#0(0x3b5d05e10) pcs=0x3b678c128
clk=0x3b7e200d0 ci=0x3b5b204c8 pn=0x39955d2b8 ctx=0x3b86ee988
kgsccflg=0 llk[0xffffffff7ce57d30,0xffffffff7ce57d30] idx=0
xscflg=c0102276 fl2=c000400 fl3=2202008 fl4=100
Frames pfr 0xffffffff7ce67098 siz=85976 efr 0xffffffff7ce66fb8 siz=85960
Cursor frame dump
enxt: 7.0x00000168 enxt: 6.0x00008000 enxt: 5.0x00008000 enxt: 4.0x00003978
enxt: 3.0x00000490 enxt: 2.0x000000b8 enxt: 1.0x00000fa0
pnxt: 1.0x00000010
kxscphp=0xffffffff7dd80a18 siz=984 inu=312 nps=312
kxscwhp=0xffffffff7ddd2cc8 siz=8136 inu=6264 nps=3968
kxscefhp=0xffffffff7ce51468 siz=88456 inu=86128 nps=86128


FileName
----------------
ORAHCMU_ora_8602.trc

FileComment
----------------------


Oracle Support - August 27, 2010 6:13:39 PM GMT+08:00 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===

=== ODM Data Collection ===

Trace file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc


*** 2010-08-23 14:43:00.472
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
startup image information
iid info sz=245752512 inode=65458 ts=0x4c6df668
current process image information
iid info sz=245750720 inode=65427 ts=0x4c7204b0
set _disable_image_check = TRUE to disable this check
qksceLinearToCe error

*** 2010-08-23 14:43:42.974
*** SESSION ID:(220.111) 2010-08-23 14:43:42.974
*** CLIENT ID:(ujwal) 2010-08-23 14:43:42.974
*** SERVICE NAME:(ORAHCMU) 2010-08-23 14:43:42.974

DDE: Problem Key 'ORA 600 [15801]' was flood controlled (0x6) (incident: 19885)
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
kxfxdss
KXFXSLAVESTATE dump [0, 0]
(pgakid: 0 oercnt: 0 oerrcd: -2224892588)
kxfxdss
no current cursor context.
kxfxdss
no cursors.

关于binary no match的问题已知是由于在实例启动情况下relink导致的;这个case提交了SR,metalink认为ORA-600 15801一般由QC与服务子进程通信问题引起:

The ORA-600 15801 is reporting a communication problem between QC and slaves related with messages sent/received.
Alert log reports several of the following error on the ASM instance:
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]

last wait was for 'eq: Msg Fragment' 

DIAGNOSTIC ANALYSIS:
--------------------
There were also several of the following message in the alert log:
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these 
messages

So, I asked the customer to set the "_disable_image_check" = true 
This had no impact on the ora-600 errors as expected.

ORA-600 [15801] is signalled when a message overflow occurs between  PQ 
processes.

WORKAROUND:
-----------
none 
RELATED BUGS:
-------------
none
REPRODUCIBILITY:
----------------
intermittent but frequently - occurs at all different times of the day.
STACK TRACE:
------------
*** ID:(29.2904) 2006-07-05 15:50:57.972
qksceLinearToCe error
*** 15:50:58.233
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]
----- Call Stack Trace -----

kxfxGeter qks3tttdefReceive kxfxsui kxfxsp kxfxmai kxfprdp 

    SO: 0x67977018, type: 4, owner: 0x6793f208, flag: INIT/-/-/0x00
    (session) sid: 29 trans: (nil), creator: 0x6793f208, flag: (c0000041) 
USR/- BSY/-/-/-/-/-
              DID: 0000-0012-0000FADB, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 3, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    O/S info: user: oracle, term: , ospid: 4558, machine: 
    last wait for 'eq: Msg Fragment' blocking sess=0x(nil) seq=2 
wait_time=4441 seconds since wait started=3
                ct path write=1002ffff, ct path write temp=2, Network=0
    Dumping Session Wait History
     for 'eq: Msg Fragment' count=1 wait_time=4441
                ct path write=1002ffff, ct path write temp=2, Network=0
     for 'eq: Msg Fragment' count=1 wait_time=31
                ct path write=1002ffff, ct path write temp=1, Network=0
    temporary object counter: 0

最后这个case通过设置10235和10501事件后错误不再产生了:

event = "10235 trace name context forever, level 2"  

10235, 00000, "check memory manager internal structures" 

event = "10501 trace name context forever, level 1"
  
10501, 00000, "periodically check selected heap"
// *Cause:
// *Action:
//    Level:  0x01 PGA
//            0x02 SGA
//            0x04 UGA
//            0x08 current call
//            0x10 user call
//            0x20 large allocation pool

沪ICP备14014813号-2

沪公网安备 31010802001379号