CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

客户的一套11.2.0.1 RAC系统采用ASM diskgroup 存放ocr和votedisk,该REG diskgroup中的某个LUN disk由于硬件的原因损坏了,导致冗余的votedisk表决磁盘有一个处于OFFLINE状态,客户希望能删除该OFFLINE的votedisk并新增一个可用的。

在删除该votedisk文件时出现了CRS-4258的错误,错误如下:

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  5b3380d6367e4f94bf19e9db5f2f684e ()  []
 2. ONLINE   6802e6d139354fb3bf95725dd01a02fd (/dev/ocr2) [REG]
 3. ONLINE   a433d51ebd2d4facbfc8e95b017f5393 (/dev/asm-disk1) [REG]
 4. ONLINE   3784d344bffa4f6ebff21c4dd3c873bd (/dev/asm-disk2) [REG]
Located 4 voting disk(s).

crsctl delete css votedisk 5b3380d6367e4f94bf19e9db5f2f684e
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

居然无法移除ASM存储上的voting files,太搞笑了。

客户在MOS上找到了CRS-4258相关问题的Note:

CRS-4258: Addition and Deletion of Voting Files are not Allowed Because the Voting Files are on ASM in 11gR2 [ID 1060146.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.1 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM in 11gR2.


Changes
Stale voting files are seen after accidently dropping one of ASM disks belonging to the ASM diskgroup where voting files are stored.
And CRS-4258 occurs when trying to delete the stale voting files using crsctl delete css votedisk FUID.

[root@grid]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5b91aad0a2184f3dbfa8f970e8ae4d49 (/dev/oracleasm/disks/ASM10) [PLAY]
2. ONLINE 53b1b40b73164f9ebf3f498f6d460187 (/dev/oracleasm/disks/ASM9) [PLAY]
3. OFFLINE 82dfd04b96f14f6dbf36f5a62b118f61 () []

[root@grid]# crsctl delete css votedisk 82dfd04b96f14f6dbf36f5a62b118f61
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM
Cause

1. Seeing stale voting files is due to bug 9024611.

2. "delete" command is not available , only "replace" command is available when voting files are stored on  ASM diskgroup.    

    Please see Oracle Clusterware Administration and Deployment Guide11g Release 2 (11.2)

Solution


1. This issue is permanently fixed in 11.2.0.2.0.

2. Apply patch 9024611. Please contact Oracle support if this patch is not available on your platform.

3. If CSS has stale voting files even after applying patch 9024611, do the following workaround -

WORKAROUND:
Do something to trigger ASM to try to relocate the voting file.

e.g)  $ crsctl replace votedisk  +asm_disk_group   --- Put available ASM diskgroup

        $ crsctl query css votedisk         --- Check if voting files are all online on the new ASM diskgroup
        $ crsctl replace votedisk +PLAY    -- Put the original ASM diskgroup where voting files were 

4. If the workaround above cannot be followed for any reason then you can request the patch for unpublished bug 9409327 for your platform.

References
BUG:9294664 - NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

Hdr: 9294664 11.2.0.1 PCW 11.2.0.1 ADMUTL PRODID-5 PORTID-226 9024611
Abstract: NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

PROBLEM:
--------
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).


[root@sdc-drrac01 grid]# crsctl delete css votedisk 
a9b785a59c3c4f67bf15babc67ffb79a
CRS-4258: Addition and deletion of voting files are not allowed because the 
voting files are on ASM

DIAGNOSTIC ANALYSIS:
--------------------
Ct is performing some voting disk failover scenarios in which he has removed 
the 2 votedisk which were on ASM buy drop disk using asmlib and after that 
recreating the disk again and start the cluster in exclusive mode and start 
the ASM and mount the diskgourp So that rebalancing has been done but after 
that

 crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).

and not able to drop the vote disk which is offiline 

WORKAROUND:
-----------
n/a

RELATED BUGS:
-------------
as per bug 9024611 tried the workaround:

but while running 

crsctl css votedisk delete 

we got syntax error and found that there is no command with crsctl css ...

这个Note说明不能移除ASM存储内voting files的问题在11.2.0.2.0上已经解决了,也可以通过安装one-off patch 9024611来修复。

但是实际在11.2.0.2上测试可以发现仍旧无法删除ASM上的voting files:

root@rh2 ~]# crsctl query crs  releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.2.0]

[root@rh2 ~]# crsctl query crs  activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]


[grid@rh2 ~]$ /s01/grid/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.


Oracle Home       : /s01/grid
Central Inventory : /s01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/grid/oui
Log file location : /s01/grid/cfgtoollogs/opatch/opatch2011-08-04_18-50-34PM.log

Patch history file: /s01/grid/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /s01/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-08-04_18-50-34PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.


There are no Interim patches installed in this Oracle Home.


Rac system comprising of multiple nodes
  Local node = rh2
  Remote node = rh3

--------------------------------------------------------------------------------

OPatch succeeded.


[root@rh2 ~]# crsctl delete css votedisk a433d51ebd2d4facbfc8e95b017f5393

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

又是一个伪修复的Bug….!!

无法,寄希望与replace能解决问题,结果发现:

crsctl replace votedisk +DATA
Failed to create voting files on disk group DATA.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

方法四中指出的unpublished bug 9409327(Patch 9409327: OFFLINE VF ENTRY REMAINS AFTER PATCH FOR BUG 9024611),目前仅在IBM AIX on POWER Systems (64-bit)的11.2.0.1上有对应的补丁。

crsctl status resource -t -init in 11.2.0.2 grid infrastructure

11.2.0.2的grid infrastructure中crsctl stat res 命令不再显示如ora.cssd、ora.ctssd、ora.diskmon等基础资源的信息,如果用户想要了解这些resource状态需要加上-init选项:

[grid@rh2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[grid@rh2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rh2
ora.LISTENER.lsnr
               OFFLINE OFFLINE      rh2
ora.asm
               ONLINE  ONLINE       rh2
ora.gsd
               OFFLINE OFFLINE      rh2
ora.net1.network
               ONLINE  ONLINE       rh2
ora.ons
               ONLINE  ONLINE       rh2
ora.registry.acfs
               OFFLINE OFFLINE      rh2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        OFFLINE OFFLINE
ora.dw.db
      1        OFFLINE OFFLINE
ora.maclean.db
      1        OFFLINE OFFLINE
ora.oc4j
      1        OFFLINE OFFLINE
ora.prod.db
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre_preconnect.svc
      1        OFFLINE OFFLINE
ora.prod.maclean_taf.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.rh2.vip
      1        OFFLINE OFFLINE
ora.rh3.vip
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        OFFLINE OFFLINE                                       

[grid@rh2 ~]$ crsctl stat res -t -init 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rh2                      Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rh2
ora.crf
      1        ONLINE  ONLINE       rh2
ora.crsd
      1        ONLINE  ONLINE       rh2
ora.cssd
      1        ONLINE  ONLINE       rh2
ora.cssdmonitor
      1        ONLINE  ONLINE       rh2
ora.ctssd
      1        ONLINE  ONLINE       rh2                      OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rh2
ora.drivers.acfs
      1        ONLINE  OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rh2
ora.gipcd
      1        ONLINE  ONLINE       rh2
ora.gpnpd
      1        ONLINE  ONLINE       rh2
ora.mdnsd
      1        ONLINE  ONLINE       rh2

此外在11.2.0.2的grid中当我们想启动、停止、修改这些init资源时都需要加上-init选项,否则将出现CRS-2613: Could not find resource错误:

[grid@rh2 ~]$ crsctl stat res ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl modify res ora.asm -attr AUTO_START=never

[grid@rh2 ~]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME%.ASM%CRS_CSS_NODENUMBER%.asm
AUTO_START=never
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=asm) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle ASM resource
ENABLED=1
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(rh2)=+ASM1
GEN_USR_ORA_INST_NAME@SERVERNAME(rh3)=+ASM2
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=600
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_INST_NAME=+ASM%CRS_CSS_NODENUMBER%
USR_ORA_OPEN_MODE=mount
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.2.0

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13
ora.asm
ora.cluster_interconnect.haip
ora.crf
ora.crsd
ora.cssd
ora.cssdmonitor
ora.ctssd
ora.diskmon
ora.drivers.acfs
ora.evmd
ora.gipcd
ora.gpnpd
ora.mdnsd

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13|xargs crsctl status resource
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
CRS-2613: Could not find resource 'ora.crf'.
CRS-2613: Could not find resource 'ora.crsd'.
CRS-2613: Could not find resource 'ora.cssd'.
CRS-2613: Could not find resource 'ora.cssdmonitor'.
CRS-2613: Could not find resource 'ora.ctssd'.
CRS-2613: Could not find resource 'ora.diskmon'.
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-2613: Could not find resource 'ora.evmd'.
CRS-2613: Could not find resource 'ora.gipcd'.
CRS-2613: Could not find resource 'ora.gpnpd'.
CRS-2613: Could not find resource 'ora.mdnsd'.
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl status res ora.crsd -init -p
NAME=ora.crsd
TYPE=ora.crs.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=AGENT=1,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=1,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DAEMON_TRACING_LEVELS=AGENT=0,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=0,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CRSD"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)pullup(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.asm,intermediate:ora.cssd,intermediate:ora.gipcd)
STOP_MODE=NONE
STOP_TIMEOUT=43200
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[grid@rh2 ~]$ crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65   
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Modify failed, or completed with errors.

/* 修改某些资源的属性要求root权限 */

[root@rh2 ~]# crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65 

[root@rh2 ~]# crsctl status res ora.crsd -init -p|grep SCRIPT_TIMEOUT
SCRIPT_TIMEOUT=65

[root@rh2 ~]# crsctl status res ora.ctssd -p -init
NAME=ora.ctssd
TYPE=ora.ctss.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DAEMON_TRACING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Ctss Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.cssd,ora.gipcd)pullup(ora.cssd,ora.gipcd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(ora.cssd,ora.gipcd)
STOP_TIMEOUT=60
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[root@rh2 ~]# crsctl status res ora.diskmon -p -init
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=3
CHECK_TIMEOUT=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=
DAEMON_TRACING_LEVELS=
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Diskmon"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.cssd)pullup:always(ora.cssd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
USR_ORA_ENV=ORACLE_USER=grid
VERSION=11.2.0.2.0

[root@rh2 ~]# crsctl status res ora.cssd -init -p
NAME=ora.cssd
TYPE=ora.cssd.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
AGENT_HB_INTERVAL=0
AGENT_HB_MISCOUNT=10
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=abort
CLEAN_COMMAND=
CSSD_MODE=
CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=grid
DAEMON_LOGGING_LEVELS=CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
DAEMON_TRACING_LEVELS=CSSD=0,GIPCNM=0,GIPCGM=0,GIPCCM=0,CLSF=0,SKGFD=0,GPNP=0,OLR=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CSSD"
DETACHED=true
ENABLED=1
ENV_OPTS=
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
OMON_INITRATE=1000
OMON_POLLRATE=500
ORA_OPROCD_MODE=
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCD_TIMEOUT=1000
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=3
SCRIPT_TIMEOUT=600
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor)
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
USR_ORA_ENV=
VMON_INITLIMIT=16
VMON_INITRATE=500
VMON_POLLRATE=500

Oracle RAC内部错误:ORA-00600[keltnfy-ldmInit]一例

一套SUNOS上的2节点10.2.0.2 RAC系统日前出现ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []内部错误,错误发生时系统操作人员误使用hostname命令修改了1号主机的主机名,之后陆续出现以上ora-00600错误,同时操作系统日志显示RAC CSS进程意外终止,具体日志如下:

================== OS Message=====================
Jan 10 11:15:10 cupd25k-a root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Duplicate Oracle CLSMON found. Killing and restarting it.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CLSMON terminated with unexpected status 137. Respawning

/* 这里的Duplicate Oracle CLSMON found 因该指的是OCLSMON进程,
"In Oracle 10.2.0.2 and above there is an additional process called OCLSOMON
which monitors the CSS daemon for hangs or scheduling issues and can reboot a
node if there is a perceived hang. OCLSOMON is spawned in init.cssd and runs
as the Oracle user."
   oclsmon进程在10.2.0.2以后版本被引入,用以监视css进程,
   若发生hang或操作系统调度问题时该进程可能会reboot节点,
   oclsmon进程会被init.cssd脚本spawned.  */

==================oclsmon.log======================
2011-01-10 11:15:11.376
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.479
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:11.737
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.751
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.006
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.023
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.278
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.293
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1

/*  skgxn是Oracle Clusterware用以监视skgxn事件(即第三方CLUSTERWARE相关的事宜,他们应该有用sun的cluster);
    似乎是修改hostname导致了Oracle CSS出现了fatal error,并启动了一个以上的OCLSMON进程(Duplicate Oracle CLSMON found),
    最后"Oracle CSS daemon failed to start up. Check CRS logs for diagnostics",
    在Oracle instance启动的情况下25k-a节点的CSS进程意外终止,
    可能导致该节点上的所有实例的LMD(global Enqueue Service daemon)、LMON无法正常工作而导致实例hang住。*/

==========================alert.log====================
Errors in file /oracle/oracle/admin/BOCPCS/udump/bocpcs1_ora_12320.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []

=========================part of trace file===============
*** 2011-01-10 11:11:02.957
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFF9D40 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFF8EE8 ?
                                                   FFFFFFFF7FFFA640 ?
                                                   000000008 ?
kgerinv()+200        PTR_CALL 0000000000000000     000000002 ? 10638A1CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A1CC ?
kgeasnmierr()+28     CALL     kgerinv()            106384B98 ? 000000000 ?
                                                   105D3B940 ? 000000002 ?
                                                   FFFFFFFF7FFFDFF0 ?
                                                   000001430 ?
keltnfy()+784        CALL     kgeasnmierr()        106384B98 ? 1064DCBF0 ?
                                                   105D3B940 ? 000000002 ?
                                                   000000000 ? 00000002E ?
kscnfy()+552         PTR_CALL 0000000000000000     10639B498 ? 38001E7A8 ?
                                                   1055AC5D0 ? 10639B498 ?
                                                   000102C00 ? 10638A1C0 ?
ksucrp()+2436        CALL     kscnfy()             000008000 ? 000808214 ?
                                                   100C4C220 ? 1055C6680 ?
                                                   00000000F ? 000000001 ?
opiino()+2056        CALL     ksucrp()             000106387 ? 380007608 ?
                                                   000000000 ? 000380000 ?
                                                   000106000 ? 106387618 ?
opiodr()+1488        PTR_CALL 0000000000000000     10555A000 ?
                                                   FFFFFFFF7FFFF1C8 ?
                                                   00010555A ? 000106000 ?
                                                   105C83000 ? 000000001 ?
opidrv()+828         CALL     opiodr()             106391000 ? 000000000 ?
                                                   106390DD8 ? 106390000 ?
                                                   106391BD0 ? 000106000 ?
sou2o()+80           CALL     opidrv()             106394358 ? 000000001 ?
                                                   00000003C ? 000000000 ?
                                                   00000003C ? 000106000 ?
opimai_real()+124    CALL     sou2o()              FFFFFFFF7FFFF788 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF7B0 ?
                                                   105C82000 ? 000105C82 ?
main()+152           CALL     opimai_real()        000000002 ?
                                                   FFFFFFFF7FFFF888 ?
                                                   103F1BBCC ? 10632DB10 ?
                                                   002411E44 ? 000014400 ?
_start()+380         CALL     main()               000000002 ? 000000008 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF898 ?
                                                   FFFFFFFF7FFFF9A8 ?
                                                   FFFFFFFF7C700200 ?

/* 可以看到以上trace文件指出了no session,
    在服务进程启动阶段遭遇了该keltnfy-ldmInit内部错误*/

metalink文档Startup Database Produces Ora-00600: [Keltnfy-Ldminit] [ID 336447.1]
介绍了该内部错误一般由主机上的不当网络配置引起,很显然使用hostname命令修改了一个无法解析的
主机名时可能引发该ORA-00600[keltnfy-ldmInit]内部错误。

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3 - Release: 10.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 09-Jun-2010***
Symptoms

An startup nomount on Oracle 10g Release 2 database produces the following exception in alert log

Starting up ORACLE RDBMS Version: 10.2.0.1.0.
Errors in file /opt/oracle/10.2/admin/ORCL/udump/ORCL_ora_535.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
USER: terminating instance due to error 600
Instance terminated by USER, pid = 535
Cause

The problem is related to getting host information.
In this case, ldmInit()/sldmInit() is failing with error 46 : LDMERR_HOST_NOT_FOUND

The following exception may also occur :

LDMERR_SOSD_INIT         OSD init failed to be specific in these OSD failures
 LDMERR_BAD_ADDR         bad address when system call gethostname failed
 LDMERR_HOST_NOT_FOUND   gethostbyname system call fails
 LDMERR_NO_SUPPORT       when specific address type is not supported

Development has fixed two bugs so far regarding this issue

Bug:5438154 - Abstract: ORA-600[KELTNFY-LDMINIT]  STARTING THE DB
Release Notes:
ldmInit returned LDMERR_HOST_NOT_FOUND for the machine huge alias list/address list
Workaround:
reduce the alais list of the machine

Bug:5486074 - Abstract: ORA-600 [KELTNFY-LDMINIT] WHEN DNS IS NOT AVAILABLE
Release Notes:
Internal error is raised by the Server Generated Alert subsystem when it can not determine Host Name or
Network Address. This can be caused by DNS server being unaavilable. 

Solution

The fix for 5486074 will not fix any underlying error from gethostbyname(), it just change the internal error to a warning message :

 "Warning: keltnfy call to ldmInit failed with error 46"

You will still need to fix the network config issue.  

These are the check you can do verify the host information 

      Check permission on /etc/hosts 

$ ls -l /etc/hosts
-rw-r--r--  2 root root 194 Oct 17  2006 /etc/hosts

      Check if /etc/hosts file is correctly configured

              ( all of this on one line ). 

Check the hostname:
$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname
      Check if /etc/nodename is correctly configured

If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.

$ nslookup
$ nslookup
$ nslookup 

The forward and reverse lookup should succeed and return consistent address/info.  

 Check nsswitch.conf

$ more nsswitch.conf
hosts:      files dns
Make sure host lookup is also done through the /etc/hosts file and not just dns.  It is recommended that FILES come first before DNS.
Also, check the resolv.conf. This makes sure that the DNS is working properly.

显然在生产主机上使用hostname命令是危险的,因为你很难保证你在打字的时候不会因为同事的一下拍击而输错,有人说在生产环境中rm命令因该被禁用,那么这种特殊待遇对hostname命令也适用,我们可以用什么来代替hostname查看主机名呢?选择可以有非常多,这里我推荐一种:

-bash-3.00$ oslevel -r 
5300-07

-bash-3.00$ hostname
askmac.cn

-bash-3.00$ uname -n
askmac.cn

/* uname -n完全可以满足你的需要! */
That's great!

利用pagespeed插件优化网站css层叠样式文件

“不务正业”的google最近发布了pagespeed插件和apache 2专有的mod_pagespeed页面优化模块;pagespeed插件目前仅有firefox版的,该插件要求预安装有Firebug页面debugger插件,你可以通过Tools->Add-ons->Get Add-ons菜单添加Firebug插件,之后登陆pagespeed在code.google.com的官方页面安装pagespeed插件。

pagespeed插件的使用十分简单,只要在打开你希望优化的页面后,点选Firefox工具栏上的Tools->FireBug->Open Firebug in New Window选项;如我在我的www.askmac.cn页面上打开Open Firebug in New Window就会出现以窗口:
[Read more…]

试用IE9 Preview

IE 9 Preview版现在可以从http://ie.microsoft.com/testdrive/下载到了:
[Read more…]

沪ICP备14014813号-2

沪公网安备 31010802001379号