了解ocssd.bin如何控制RAC节点重启

ocssd.bin是RAC cluterware重要的后台进程,这里我们不再介绍其复杂的功用,只介绍一些ocssd.bin reboot node的细节。

注意在11gR2 standalone 环境中ocssd.bin crash/panic或者被手动KILL掉,都不会导致节点重启:

 

[oracle@mlab1 ~]$ crsctl  stat res  -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       mlab1                                        
ora.FRA.dg
               ONLINE  ONLINE       mlab1                                        
ora.LISTENER.lsnr
               ONLINE  ONLINE       mlab1                                        
ora.asm
               ONLINE  ONLINE       mlab1                    Started             
ora.ons
               OFFLINE OFFLINE      mlab1                                        
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       mlab1                                        
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       mlab1                                        
ora.proda.db
      1        ONLINE  ONLINE       mlab1                    Open                

首先把CSSD的LOG LEVEL升到2,以便获得更多的CSSD日志

[oracle@mlab1 ~]$ crsctl debug log css CSSD:2
CRS-4151: DEPRECATED: use crsctl set log {css|crs|evm}
Set CSSD Module: CSSD  Log Level: 2

在11g中可以使用crsctl set log css语法来替代crsctl debug log了

[oracle@mlab1 ~]$ crsctl set log css CSSD:2
Set CSSD Module: CSSD  Log Level: 2

[oracle@mlab1 ~]$ crsctl get log css CSSD
Get CSSD Module: CSSD  Log Level: 2

oracle   17797     1  0 Oct19 ?        00:00:11 /g01/oracle/app/oracle/product/11.2.0/grid/bin/ocssd.bin 
oracle   29016 28865  0 21:47 pts/1    00:00:00 grep cssd.bin

[oracle@mlab1 ~]$ kill -9 17797

[oracle@mlab1 ~]$ ps -ef|grep cssd.bin
oracle   29128     1  0 21:48 ?        00:00:00 /g01/oracle/app/oracle/product/11.2.0/grid/bin/ocssd.bin 
oracle   29144 28865  0 21:49 pts/1    00:00:00 grep cssd.bin

[oracle@mlab1 ~]$ uptime 
 21:49:13 up 28 days, 22:24,  3 users,  load average: 0.16, 0.06, 0.01

tail -f ocssd.log 

2012-10-21 09:45:06.853: [    CSSD][1105594688]clssgmClientConnectMsg: properties of cmProc 0x7f270c1617e0 - 1,2,3,4,5
2012-10-21 09:45:06.853: [    CSSD][1105594688]clssgmClientConnectMsg: Connect from con(0x20b8) proc(0x7f270c1617e0) pid(28935) version 11:2:1:4, properties: 1,2,3,4,5
2012-10-21 09:45:06.853: [    CSSD][1105594688]clssgmClientConnectMsg: msg flags 0x0000
2012-10-21 09:45:06.856: [    CSSD][1105594688]clssgmDeadProc: proc 0x7f270c1617e0
2012-10-21 09:45:06.856: [    CSSD][1105594688]clssgmDestroyProc: cleaning up proc(0x7f270c1617e0) con(0x20b8) skgpid  ospid 28935 with 0 clients, refcount 0
2012-10-21 09:45:06.856: [    CSSD][1105594688]clssgmDiscEndpcl: gipcDestroy 0x20b8
2012-10-21 09:48:57.641: [    CSSD][2632525536]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2012-10-21 09:48:57.641: [    CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0
2012-10-21 09:48:57.642: [    CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0
2012-10-21 09:48:57.642: [    CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
2012-10-21 09:48:57.642: [    CSSD][2632525536]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0
2012-10-21 09:48:57.642: [    CSSD][2632525536]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0
2012-10-21 09:48:57.642: [    CSSD][2632525536]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0
[    CSSD][2632525536]clsugetconf : Configuration type [3].
2012-10-21 09:48:57.642: [    CSSD][2632525536]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (local-only) mode with uniqueness value 1350827337
2012-10-21 09:48:57.642: [    CSSD][2632525536]clssscmain: Environment is production
2012-10-21 09:48:57.642: [    CSSD][2632525536]clssscmain: Core file size limit extended
2012-10-21 09:48:57.654: [    CSSD][2632525536]clssscmain: GIPCHA down 0
2012-10-21 09:48:57.655: [    CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2012-10-21 09:48:57.655: [    CSSD][2632525536]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536
2012-10-21 09:48:57.655: [    CSSD][2632525536]clssscExtendLimits: The current soft limit for locked memory is 3955359744, hard limit is 3955359744
2012-10-21 09:48:57.655: [    CSSD][2632525536]clssscmain: Running as user oracle
2012-10-21 09:48:57.656: [    CSSD][2632525536]clssscmain: RT queue setting is at default value
2012-10-21 09:48:57.657: [    CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter auth rep (9) failed with rc 21
2012-10-21 09:48:57.657: [    CSSD][2632525536]clssgmInitCMInfoMin: clsmonJoined set via localonly
[  clsdmt][1097894208]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=mlab1DBG_CSSD))
2012-10-21 09:48:57.658: [  clsdmt][1097894208]PID for the Process [29128], connkey 4
2012-10-21 09:48:57.658: [    CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter diagwait (14) failed with rc 21
2012-10-21 09:48:57.662: [    CSSD][2632525536]clssnmInitNMInfoMin: Initializing first-reconfig to (0)
2012-10-21 09:48:57.662: [    CSSD][2632525536]clssscmain: initgminfo done
2012-10-21 09:48:57.662: [    CSSD][1082157376]clssgmclientlsnr: Spawned
2012-10-21 09:48:57.662: [    CSSD][1082157376]clssgmEvtInformation: reqtype (13) cmProc ((nil)) client ((nil))
2012-10-21 09:48:57.662: [    CSSD][1082157376]clssgmEvtInformation: reqtype (13) req (0x12e9900)
2012-10-21 09:48:57.663: [    CSSD][1082157376]clssgmclientlsnr: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_mlab1_)(GIPCID=00000000-00000000-29128))
2012-10-21 09:48:57.665: [    CSSD][2632525536]clssscmain:read clusterguid 5f0de5b55b586f17bfc26fd1c7c638a0 from OLR
2012-10-21 09:48:57.665: [    CSSD][2632525536]clssscmain: Cluster GUID is 5f0de5b55b586f17bfc26fd1c7c638a0
2012-10-21 09:48:57.665: [    CSSD][2632525536]clssnmNotifyReq: type (12)
2012-10-21 09:48:57.665: [    CSSD][2632525536]clssscmain: Skipping voting device init for local_only
2012-10-21 09:48:57.665: [    CSSD][2632525536]clssnmInitNodeDB: Initializing with OCR id 0
2012-10-21 09:48:58.612: [    CSSD][1082157376]clssscSelect: cookie accept request 0x7fe79802a2d0
2012-10-21 09:48:58.612: [    CSSD][1082157376]clssgmAllocProc: (0x139f4f0) allocated
2012-10-21 09:48:58.612: [    CSSD][1082157376]clssgmClientConnectMsg: properties of cmProc 0x139f4f0 - 1,2,3,4,5
2012-10-21 09:48:58.613: [    CSSD][1082157376]clssgmClientConnectMsg: Connect from con(0xd2) proc(0x139f4f0) pid(29114) version 11:2:1:4, properties: 1,2,3,4,5
2012-10-21 09:48:58.613: [    CSSD][1082157376]clssgmClientConnectMsg: The CSSD agent is process (0x139f4f0), number 1
2012-10-21 09:48:58.613: [    CSSD][1082157376]clssgmEvtInformation: reqtype (11) cmProc (0x139f4f0) client ((nil))
2012-10-21 09:48:58.613: [    CSSD][1082157376]clssgmEvtInformation: reqtype (11) req (0x139a730)
2012-10-21 09:48:58.613: [    CSSD][1082157376]clssnmQueueNotification: type (11) 0x139a730
2012-10-21 09:48:58.665: [    CSSD][1078692160]clssnm_skgxnmon: Compatible vendor clusterware not in use
2012-10-21 09:48:58.665: [    CSSD][2632525536]clssnmNotifyReq: type (20)
2012-10-21 09:48:58.665: [    CSSD][2632525536][INFO]clssnmInitNodeDB: local only, no IPMI allowed
2012-10-21 09:48:58.665: [    CSSD][1078692160]clssgmDeathChkThread: Spawned
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmNotifyReq: type (11)
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmNotifyReq: found 1 for type (11) 0x139a730
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmCompleteGMReq: Completed request type 11 with status 1
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssgmDoneQEle: re-queueing req 0x12e5390 status 1
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmNotifyReq: type (13)
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmNotifyReq: found 1 for type (13) 0x12e9900
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssnmCompleteGMReq: Completed request type 13 with status 1
2012-10-21 09:48:58.666: [    CSSD][2632525536]clssgmDoneQEle: re-queueing req 0x12cdb80 status 1
2012-10-21 09:48:58.666: [    CSSD][1083734336]clssgmPeerListener: Spawned for node
2012-10-21 09:48:58.667: [    CSSD][1083734336]clssgmPeerListener: physical hostname mlab1 privname mlab1
2012-10-21 09:48:58.667: [    CSSD][1083734336]clssgmPeerListener: gipc addr gipc://mlab1:gm_
2012-10-21 09:48:58.667: [    CSSD][1083734336]clssgmPeerListener: gipc addr gipcha://mlab1:gm2_
2012-10-21 09:48:58.667: [    CSSD][1082157376]clssgmclientOpenEndp: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_localhost_1)(GIPCID=00000000-00000000-29128))
2012-10-21 09:48:58.667: [    CSSD][1100257600]clssnmPollingThread: Spawned, poll interval 1000
2012-10-21 09:48:58.667: [    CSSD][1107290432]clssnmRcfgMgrThread: Spawned
2012-10-21 09:48:58.667: [    CSSD][1108867392]clssnmClusterListener: Spawned
2012-10-21 09:48:58.667: [    CSSD][1108867392]clssnmOpenEndp: Not opening endp for localonly mode
2012-10-21 09:48:58.668: [    CSSD][1082157376]clssgmclientOpenEndp: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_mlab1_localhost)(GIPCID=00000000-00000000-29128))
2012-10-21 09:48:58.668: [    CSSD][1082157376]clssgmCheckReqNMCompletion: Completing request type 11 for proc (0x139f4f0), operation status 1, client status 0
2012-10-21 09:48:58.668: [    CSSD][1091864896]clssnmSendingThread: Spawned
2012-10-21 09:48:58.669: [    CSSD][1082157376]clssgmEvtInformation: reqtype (22) cmProc (0x139f4f0) client ((nil))
2012-10-21 09:48:58.669: [    CSSD][1082157376]clssgmEvtInformation: reqtype (22) req (0x7fe798207480)
2012-10-21 09:48:58.669: [    CSSD][1082157376]clssnmQueueNotification: type (22) 0x7fe798207480
2012-10-21 09:48:59.667: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:00.667: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:01.667: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:02.667: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:03.667: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:04.668: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:05.668: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmRcfgMgrThread: Local Join
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmLocalJoinEvent: begin on node(1), waittime 193000
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmLocalJoinEvent: scanning 2 nodes
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmLocalJoinEvent: Starting initial cluster reconfig
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmDoSyncUpdate: Initiating sync 0
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 1, from -1, changes 1
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 09:49:06.168: [    CSSD][1107290432]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2012-10-21 09:49:06.170: [    CSSD][1107290432]clssnmSetFirstIncarn: got incarnation 243778226 from OLR
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmSetFirstIncarn: Incarnation set to 243778227
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 243778227
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmSetupAckWait: Ack message type (11)
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmSetupAckWait: node(1) is ALIVE
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227), indicating EXADATA fence initialization incomplete
2012-10-21 09:49:06.174: [    CSSD][1107290432]List of nodes that have ACKed my sync: NULL
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227)
2012-10-21 09:49:06.174: [    CSSD][1107290432]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssscUpdateEventValue: NMReconfigInProgress  val 1, changes 2
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmHandleSync: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmHandleSync: initleader 1 newleader 1
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmQueueClientEvent:  Sending Event(2), type 2, incarn 0
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmQueueClientEvent: Node[1] state = 1, birth = 0, unique = 1350827337
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmHandleSync: Acknowledging sync: src[1] srcName[mlab1] seq[1] sync[243778227]
2012-10-21 09:49:06.174: [    CSSD][1108867392]clssnmSendAck: node 1, mlab1, syncSeqNo(243778227) type(11)
2012-10-21 09:49:06.174: [    CSSD][2632525536]NMEVENT_SUSPEND [00][00][00][00]
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmUpdateEventValue: CmInfo State  val 5, changes 1
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmSuspendAllGrocks: Issue SUSPEND
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmSuspendAllGrocks: done
2012-10-21 09:49:06.175: [    CSSD][1108867392]clssnmHandleAck: src[1] dest[1] dom[0] seq[0] sync[243778227] type[11] ackCount(0)
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmUpdateEventValue: CmInfo State  val 2, changes 2
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmUpdateEventValue: ConnectedNodes  val 0, changes 1
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmCleanupNodeContexts():  cleaning up nodes, rcfg(0)
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmCleanupNodeContexts():  successful cleanup of nodes rcfg(0)
2012-10-21 09:49:06.175: [    CSSD][2632525536]clssgmStartNMMon:  completed node cleanup
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227), indicating EXADATA fence initialization incomplete
2012-10-21 09:49:06.175: [    CSSD][1107290432]List of nodes that have ACKed my sync: 1
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmWaitForAcks: done, syncseq(243778227), msg type(11)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmSetMinMaxVersion:node1  product/protocol (11.2/1.4)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmNeedConfReq: No configuration to change
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmDoSyncUpdate: node(1) is transitioning from joining state to active state
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmDoSyncUpdate: Wait for 0 vote ack(s)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmCheckDskInfo: Checking disk info...
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmRemove: Start
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmWaitOnEvictions: Start
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmBldSendUpdate: syncSeqNo(243778227)
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmBldSendUpdate: using msg version 4
2012-10-21 09:49:06.175: [    CSSD][1107290432]clssnmDoSyncUpdate: Sync 243778227 complete!
2012-10-21 09:49:06.175: [    CSSD][1108867392]clssnmHandleUpdate: sync[243778227] src[1], msgvers 4 icin 243778227
2012-10-21 09:49:06.175: [    CSSD][1108867392]clssnmHandleUpdate: common properties are 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2012-10-21 09:49:06.175: [    CSSD][1108867392]clssscSAGEInitFencing: kgzf fence initialization starting with GUID 5f0de5b55b586f17bfc26fd1c7c638a0, icin 243778227, node number 1, uniqueness 243778227
2012-10-21 09:49:06.176: [ default][1108867392]CELL communication is configured to use 0 interface(s):

2012-10-21 09:49:06.176: [ default][1108867392]Kgzf_ini_begin: diskmon is disabled

2012-10-21 09:49:06.176: [    CSSD][1108867392]clssscSAGEInitFencing: kgzf fence initialization successfully started
2012-10-21 09:49:06.176: [    CSSD][1108867392]clssnmUpdateNodeState: node mlab1, number 1, current state 2, proposed state 3, current unique 1350827337, proposed unique 1350827337, prevConuni 0, birth 243778227
2012-10-21 09:49:06.176: [    CSSD][1108867392]clssnmSendAck: node 1, mlab1, syncSeqNo(243778227) type(15)
2012-10-21 09:49:06.176: [    CSSD][1108867392]clssnmQueueClientEvent:  Sending Event(1), type 1, incarn 243778227
2012-10-21 09:49:06.176: [    CSSD][1108867392]clssnmQueueClientEvent: Node[1] state = 3, birth = 243778227, unique = 1350827337
2012-10-21 09:49:06.176: [    CSSD][1108867392]clssnmHandleUpdate: SYNC(243778227) from node(1) completed
2012-10-21 09:49:06.177: [    CSSD][2632525536]clssgmStartNMMon: node 1 active, birth 243778227
2012-10-21 09:49:06.177: [    CSSD][1108867392]clssnmHandleUpdate: NODE 1 (mlab1) IS ACTIVE MEMBER OF CLUSTER
2012-10-21 09:49:06.177: [    CSSD][2632525536]clssgmUpdateEventValue: Reconfig Event  val 1, changes 1
2012-10-21 09:49:06.177: [    CSSD][1108867392]clssscUpdateEventValue: NMReconfigInProgress  val -1, changes 3
2012-10-21 09:49:06.177: [    CSSD][1108867392]clssnmHandleUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 09:49:06.177: [    CSSD][2632525536]clssgmUpdateEventValue: CmInfo State  val 3, changes 3
2012-10-21 09:49:06.177: [    CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 3 waited 510
2012-10-21 09:49:06.177: [    CSSD][1083734336]clssgmUpdateEventValue: HoldRequest  val 1, changes 1
2012-10-21 09:49:06.177: [    CSSD][1085311296]clssgmReconfigThread:  started for reconfig (243778227)
2012-10-21 09:49:06.177: [    CSSD][1085311296]NMEVENT_RECONFIG [00][00][00][02]

 

 

可以看到在standalone环境下的ocssd.bin crash/killed都不会造成节点重启,因为是非cluster环境,所以reboot确实不需要,而仅仅是重启一个ocssd.bin进程。

 

 

而在RAC cluster环境中则不一样了:

 

[root@vrh1 ~]# crsctl set log css CSSD:2
Set CSSD Module: CSSD  Log Level: 2

[root@vrh1 ~]# 
[root@vrh1 ~]# ps -ef|grep ocssd.bin
grid      3929     1  1 Oct19 ?        00:57:01 /g01/11.2.0/grid/bin/ocssd.bin 
root     27297 26827  0 10:00 pts/0    00:00:00 grep ocssd.bin

[root@vrh1 ~]# kill -19 3929

signal 19是SIGSTOP

被KILL -19 cssd.bin的vrh1节点的ocssd.log

2012-10-21 10:01:31.068: [    CSSD][1086265664]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091
2012-10-21 10:01:31.068: [    CSSD][1103239488]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091
2012-10-21 10:01:31.068: [    CSSD][1092921664]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091
2012-10-21 10:01:31.195: [    CSSD][1096075584]clssnmvDiskPing: Writing with status 0x3, timestamp 199229324/1350828091
2012-10-21 10:01:31.196: [    CSSD][1111820608]clssnmvDiskPing: Writing with status 0x3, timestamp 199229324/1350828091
2012-10-21 10:01:31.220: [    CSSD][1113397568]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskh flags 0x00000000, kill block unique 0, my unique 1350627944
2012-10-21 10:01:31.220: [    CSSD][1099512128]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskg flags 0x00000000, kill block unique 0, my unique 1350627944
2012-10-21 10:01:31.220: [    CSSD][1077381440]clssnmvDiskKillCheck: not evicted, file /dev/asm-diski flags 0x00000000, kill block unique 0, my unique 1350627944

因为ocssd.bin进程absent,日志从10:01:31.220后未更新

另一节点上的ocssd.log:

2012-10-21 10:01:45.623: [    CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 50% heartbeat fatal, removal in 14.940 seconds
2012-10-21 10:01:45.623: [    CSSD][1124575552]clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 2491404, misstime 15060
2012-10-21 10:01:45.625: [    CSSD][1085704512]clssnmvDHBValidateNCopy: node 1, vrh1, has a disk HB, but no network HB, DHB has rcfg 239944581, wrtcnt, 9550714, LATS 344798954, lastSeqNo 7825639, uniqueness 1350627944, timestamp 1350828091/199229204
2012-10-21 10:01:45.625: [    CSSD][1089829184]clssnmvDHBValidateNCopy: node 1, vrh1, has a disk HB, but no network HB, DHB has rcfg 239944581, wrtcnt, 9550715, LATS 344798954, lastSeqNo 6103632, uniqueness 1350627944, timestamp 1350828091/199229324
2012-10-21 10:01:45.684: [    CSSD][1107228992]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskk flags 0x00000000, kill block unique 0, my unique 1350482491
2012-10-21 10:01:46.105: [    CSSD][1111959872]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskh flags 0x00000000, kill block unique 0, my unique 1350482491
2012-10-21 10:01:46.105: [    CSSD][1118267712]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskj flags 0x00000000, kill block unique 0, my unique 1350482491
2012-10-21 10:01:46.105: [    CSSD][1108805952]clssnmvDiskKillCheck: not evicted, file /dev/asm-diski flags 0x00000000, kill block unique 0, my unique 1350482491
2012-10-21 10:01:46.105: [    CSSD][1116690752]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskg flags 0x00000000, kill block unique 0, my unique 1350482491
2012-10-21 10:01:46.587: [    CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes
2012-10-21 10:01:46.587: [    CSSD][1126152512]clssnmSendingThread: sent 8 status msgs to all nodes

2012-10-21 10:01:53.627: [    CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 75% heartbeat fatal, removal in 6.930 seconds

2012-10-21 10:01:55.102: [    CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes
2012-10-21 10:01:55.102: [    CSSD][1126152512]clssnmSendingThread: sent 8 status msgs to all nodes

2012-10-21 10:01:57.628: [    CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 90% heartbeat fatal, removal in 2.930 seconds, seedhbimpd 1

2012-10-21 10:01:59.608: [    CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes
2012-10-21 10:01:59.608: [    CSSD][1126152512]clssnmSendingThread: sent 9 status msgs to all nodes

2012-10-21 10:02:00.560: [    CSSD][1124575552]clssnmPollingThread: Removal started for node vrh1 (1), flags 0x26040c, state 3, wt4c 0
2012-10-21 10:02:00.560: [    CSSD][1124575552]clssnmMarkNodeForRemoval: node 1, vrh1 marked for removal
2012-10-21 10:02:00.560: [    CSSD][1124575552]clssnmDiscHelper: vrh1, node(1) connection failed, endp (0x98f947), probe((nil)), ninf->endp 0x98f947
2012-10-21 10:02:00.560: [    CSSD][1124575552]clssnmDiscHelper: node 1 clean up, endp (0x98f947), init state 5, cur state 5
2012-10-21 10:02:00.560: [GIPCXCPT][1124575552] gipcInternalDissociate: obj 0x7f3bc80d8020 [000000000098f947] { gipcEndpoint : localAddr 'gipcha://vrh2:nm2_vrh-cluster/9dc0-9546-c12b-e74', remoteAddr 'gipcha://vrh1:2cf2-b3ca-7399-111', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x138606, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2012-10-21 10:02:00.560: [GIPCXCPT][1124575552] gipcDissociateF [clssnmDiscHelper : clssnm.c : 3436]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x7f3bc80d8020 [000000000098f947] { gipcEndpoint : localAddr 'gipcha://vrh2:nm2_vrh-cluster/9dc0-9546-c12b-e74', remoteAddr 'gipcha://vrh1:2cf2-b3ca-7399-111', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x138606, usrFlags 0x0 }, flags 0x0
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmRcfgMgrThread: Reconfig in progress...
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmRcfgMgrThread: sync leader(1) failed, misstime(30000) unique(1350627944)
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress  val -1, from 1, changes 18
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmDoSyncUpdate: Initiating sync 239944581
2012-10-21 10:02:00.560: [    CSSD][1129306432]clssnmDiscEndp: gipcDestroy 0x98f947
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 2, from -1, changes 19
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27000
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 239944581
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmSetupAckWait: Ack message type (11)
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmSetupAckWait: node(2) is ALIVE
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581), indicating EXADATA fence initialization complete
2012-10-21 10:02:00.560: [    CSSD][1127729472]List of nodes that have ACKed my sync: NULL
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581)
2012-10-21 10:02:00.560: [    CSSD][1127729472]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2012-10-21 10:02:00.560: [    CSSD][1129306432]clssnmHandleSync: Node vrh2, number 2, is EXADATA fence capable
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress  val 2, changes 20
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmHandleSync: local disk timeout set to 27000 ms, remote disk timeout set to 27000
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmHandleSync: initleader 2 newleader 2
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmQueueClientEvent:  Sending Event(2), type 2, incarn 239944580
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 5, birth = 239944580, unique = 1350627944
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[16] sync[239944581]
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944581) type(11)
2012-10-21 10:02:00.561: [    CSSD][1129306432]clssnmHandleAck: src[2] dest[2] dom[0] seq[0] sync[239944581] type[11] ackCount(0)
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmStartNMMon: node 1 active, birth 239944580
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmStartNMMon: node 2 active, birth 239944577
2012-10-21 10:02:00.561: [    CSSD][3611592416]NMEVENT_SUSPEND [00][00][00][06]
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmUpdateEventValue: CmInfo State  val 5, changes 50
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmSuspendAllGrocks: Issue SUSPEND
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581), indicating EXADATA fence initialization complete
2012-10-21 10:02:00.561: [    CSSD][1127729472]List of nodes that have ACKed my sync: 2
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(IG+ASMSYS$USERS) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x0, state 0x0
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmWaitForAcks: done, syncseq(239944581), msg type(11)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSetMinMaxVersion:node2  product/protocol (11.2/1.4)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(crs_version) count(2) master(2) event(2), incarn 6, mbrc 2, to member 2, events 0x0, state 0x0
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmNeedConfReq: No configuration to change
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(crs_version) count(2) master(2) event(2), incarn 6, mbrc 2, to member 0, events 0x20, state 0x0
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(30000) state(5)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmDoSyncUpdate: Wait for 0 vote ack(s)
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmCheckDskInfo: Checking disk info...
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmRemove: Start
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(CRF-) count(3) master(2) event(2), incarn 11, mbrc 3, to member 2, events 0x38, state 0x0
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmrRemoveNode: Removing node 1, vrh1, from the cluster in incarnation 239944581, node birth incarnation 239944580, death incarnation 239944581, stateflags 0x260000 uniqueness value 1350627944
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(CLSN.ONSPROC.MASTER) count(1) master(2) event(2), incarn 1, mbrc 1, to member 2, events 0xa0, state 0x0
2012-10-21 10:02:00.561: [ default][1127729472]kgzf_gen_node_reid2: generated reid cid=58a8249042c37f94bf844767ea0ae255,icin=239944576,nmn=1,lnid=239944580,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node hdr=0xfece0100

2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmrFenceSage: Fenced node vrh1, number 1, with EXADATA, handle 0
2012-10-21 10:02:00.561: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DB+ASM) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x68, state 0x0
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmSendShutdown: req to node 1, kill time 344813894
2012-10-21 10:02:00.561: [    CSSD][1127729472]clssnmsendmsg: not connected to node 1

2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmSendShutdown: Send to node 1 failed
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmWaitOnEvictions: Start
2012-10-21 10:02:00.562: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG+ASM) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x0, state 0x0
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmWaitOnEvictions: node 1, undead 1, EXADATA fence handle 0 kill reqest id 0, last DHB (1350828091, 199229324, 399572), seedhbimpd TRUE
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmCheckKillStatus: Node 1, vrh1, down, LATS(344798954),timeout(14940)
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmBldSendUpdate: syncSeqNo(239944581)
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmBldSendUpdate: using msg version 4
2012-10-21 10:02:00.562: [    CSSD][1127729472]clssnmDoSyncUpdate: Sync 239944581 complete!
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmHandleUpdate: sync[239944581] src[2], msgvers 4 icin 239944576
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmHandleUpdate: common properties are 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmUpdateNodeState: node vrh1, number 1, current state 5, proposed state 0, current unique 1350627944, proposed unique 1350627944, prevConuni 1350627944, birth 239944580
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmDeactivateNode: node 1, state 5
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmDeactivateNode: node 1 (vrh1) left cluster
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(IG+ASMSYS$BACKGROUND) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x0, state 0x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmUpdateNodeState: node vrh2, number 2, current state 3, proposed state 3, current unique 1350482491, proposed unique 1350482491, prevConuni 0, birth 239944577
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(VT+ASM) count(2) master(2) event(2), incarn 12, mbrc 2, to member 2, events 0x60, state 0x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944581) type(15)
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmQueueClientEvent:  Sending Event(1), type 1, incarn 239944581
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG+ASM0) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x0, state 0x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 0, birth = 0, unique = 0
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmHandleUpdate: SYNC(239944581) from node(2) completed
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress  val -1, changes 21
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmHandleUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(GR+GCR1) count(2) master(1) event(2), incarn 6, mbrc 2, to member 1, events 0x280, state 0x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmStartPendingConfigChange: New configuration request for CIN 0:1350628006:0
2012-10-21 10:02:00.563: [    CSSD][1129306432]  misscount          30    reboot latency      3
2012-10-21 10:02:00.563: [    CSSD][1129306432]  long I/O timeout  200    short I/O timeout  27
2012-10-21 10:02:00.563: [    CSSD][1129306432]  diagnostic wait    13  active version 11.2.0.3.0
2012-10-21 10:02:00.563: [    CSSD][1129306432]  Listing unique IDs for 5 voting files:
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_DATA) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0
x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]    voting file 1: 85edc0e8-2d274f78-bfc58cdc-73b8c68a
2012-10-21 10:02:00.563: [    CSSD][1129306432]    voting file 2: 201ffffc-8ba44faa-bfe2efec-2aa75840
2012-10-21 10:02:00.563: [    CSSD][1129306432]    voting file 3: 6f2a25c5-89964faa-bf6980f7-c5f621ce
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(ocr_vrh-cluster) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x78, state 0x0
2012-10-21 10:02:00.563: [    CSSD][1129306432]    voting file 4: 93eb3156-48454f25-bf3717df-1a2c73d5
2012-10-21 10:02:00.563: [    CSSD][1129306432]    voting file 5: 37372406-78964f88-bfbfbd31-d8b3829f
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmStartPendingConfigChange: Initiating configuration change reconfig for CIN 1350628006
2012-10-21 10:02:00.563: [    CSSD][1129306432]clssnmStartCINUpdate: Starting CIN update for type 8
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(CLSFRAME) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(CRSDMAIN) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0
2012-10-21 10:02:00.563: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(EVMDMAIN) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0
2012-10-21 10:02:00.564: [    CSSD][1119844672]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(EVMDMAIN2) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(CTSSGROUP) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x8, state 0x0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_BACKUPDG) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_SYSTEMDG) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmSuspendAllGrocks: done
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmUpdateEventValue: CmInfo State  val 2, changes 51
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmUpdateEventValue: ConnectedNodes  val 239944580, changes 18
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmCleanupNodeContexts():  cleaning up nodes, rcfg(239944580)
2012-10-21 10:02:00.564: [    CSSD][1096816960]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmCleanupNodeContexts():  successful cleanup of nodes rcfg(239944580)
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmStartNMMon:  completed node cleanup
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmStartNMMon: node 1 failed, birth (239944580, 0) (old/new)
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmStartNMMon: node 2 active, birth 239944577
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmUpdateEventValue: Reconfig Event  val 1, changes 13
2012-10-21 10:02:00.564: [    CSSD][3611592416]clssgmUpdateEventValue: CmInfo State  val 3, changes 52
2012-10-21 10:02:00.564: [    CSSD][1113536832]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0
2012-10-21 10:02:00.564: [    CSSD][1085704512]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0
2012-10-21 10:02:00.565: [    CSSD][1089829184]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmReconfigThread:  started for reconfig (239944581)
2012-10-21 10:02:00.565: [    CSSD][1130883392]NMEVENT_RECONFIG [00][00][00][04]
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmWaitOnEventValue: after HoldRequest  val 1, eval 1 waited 0
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State  val 4, from 3, changes 53
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmCleanupNodeContexts():  cleaning up nodes, rcfg(239944580)
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmDisconnectNodes: Closing connection 0x98fa69 for node vrh1, number 1, in incarnation 239944581; state flags 0x80000001, conn state flags 0x000a
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmCleanupNodeContexts():  successful cleanup of nodes rcfg(239944581)
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmUpdateEventValue: ReadyPeers  val 1, changes 7
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State  val 6, from 4, changes 54
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmEstablishConnections: 1 nodes in cluster incarn 239944581
2012-10-21 10:02:00.565: [    CSSD][1130883392]clssgmUpdateEventValue: ConnectedNodes  val 0, changes 19
2012-10-21 10:02:00.565: [    CSSD][1085704512]clssnmCINUpdateComplete: Pending CIN update completed, config state 1
2012-10-21 10:02:00.566: [    CSSD][1122998592]clssgmPeerListener: new incarn 239944581. old 239944580
2012-10-21 10:02:00.566: [    CSSD][1124575552]clssnmPollingThread: signaling reconfig for config change
2012-10-21 10:02:00.566: [    CSSD][1122998592]clssgmPeerDeactivate: node 1 (vrh1), death 239944581, state 0x80000000 connstate 0xa
2012-10-21 10:02:00.566: [ GIPCLIB][1122998592] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x183d130, idxPtr:0x7f3bd6f6f8c0, key:0x42ef7140, flags:0x0
2012-10-21 10:02:00.566: [GIPCXCPT][1122998592] gipcObjectLookupF [gipcDissociateF : gipc.c : 2175]: search found no matching oid 0000000000000000, ret gipcretKeyNotFound (36), ret gipcretInvalidObject (3)
2012-10-21 10:02:00.566: [GIPCXCPT][1122998592] gipcDissociateF [clssgmPeerDeactivate : clssgmp.c : 3525]: EXCEPTION[ ret gipcretInvalidObject (3) ]  failed to dissociate obj 0000000000000000, flags 0x0
2012-10-21 10:02:00.566: [    CSSD][1122998592]clssgmCleanFuture: discarded 0 future msgs for 1
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmDoSyncUpdate: Initiating sync 239944582
2012-10-21 10:02:00.566: [    CSSD][1122998592]clssgmPeerListener: connects done (1/1)
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 2, from -1, changes 22
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 2, from -1, changes 22
2012-10-21 10:02:00.566: [    CSSD][1122998592]clssgmUpdateEventValue: ConnectedNodes  val 239944581, changes 20
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 239944582
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmSetupAckWait: Ack message type (11)
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmSetupAckWait: node(2) is ALIVE
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmSendSync: syncSeqNo(239944582), indicating EXADATA fence initialization complete
2012-10-21 10:02:00.566: [    CSSD][1127729472]List of nodes that have ACKed my sync: NULL
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmSendSync: syncSeqNo(239944582)
2012-10-21 10:02:00.566: [    CSSD][1127729472]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmWaitChangeEventValue: ev(ConnectedNodes) changed to 239944581
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmEstablishConnections: Sending STATUS message to all nodes for incarnation 239944581
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmEstablishConnections: (1/1) connected, incarn(239944581)
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State  val 7, from 6, changes 55
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmHandleSync: Node vrh2, number 2, is EXADATA fence capable
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress  val 2, changes 23
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmHandleSync: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmSetVersions: properties common to all peers: 1,2,3,4,5,6,7,8,9,10,11
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmHandleSync: initleader 2 newleader 2
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmEstablishMasterNode: MASTER for 239944581 is node(2) birth(239944577)
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmQueueClientEvent:  Sending Event(2), type 2, incarn 239944581
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State  val 8, from 7, changes 56
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 0, birth = 0, unique = 0
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmMasterCMSync: Synchronizing group/lock status, replay-mode=0
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State  val 9, from 8, changes 57
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[20] sync[239944582]
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmMasterCMSync: processing grock(IG+ASMSYS$USERS) type(2)
2012-10-21 10:02:00.566: [    CSSD][1130883392]clssgmCleanupOrphanMembers: orphan member(1/IG+ASMSYS$USERS), birth(239944580) on node(1), birth(0/239944581)
2012-10-21 10:02:00.566: [    CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944582) type(11)

 

以上使用KILL -19 SIGSTOP cssd.bin进程也造成节点重启。

 

接着我们尝试用KILL -9 cssd.bin,节点同样重启:

 

[root@vrh1 ~]# ps -ef|grep ocssd
grid      3900     1  1 10:03 ?        00:00:37 /g01/11.2.0/grid/bin/ocssd.bin 
grid      6019  4287  0 10:39 pts/1    00:00:00 tail -f ocssd.log
root      6028  4331  0 10:39 pts/0    00:00:00 grep ocssd

[root@vrh1 ~]# kill -9 3900

2012-10-21 10:39:22.075: [    CSSD][1121757504]clssnmPollingThread: signaling reconfig for config change
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0
2012-10-21 10:40:49.822: [    CSSD][1441830624]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0
[    CSSD][1441830624]clsugetconf : Configuration type [4].
2012-10-21 10:40:49.823: [    CSSD][1441830624]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1350830449
2012-10-21 10:40:49.823: [    CSSD][1441830624]clssscmain: Environment is production
2012-10-21 10:40:49.823: [    CSSD][1441830624]clssscmain: Core file size limit extended
2012-10-21 10:40:49.835: [    CSSD][1441830624]clssscmain: GIPCHA down 0

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号