ocssd.bin是RAC cluterware重要的后台进程,这里我们不再介绍其复杂的功用,只介绍一些ocssd.bin reboot node的细节。
注意在11gR2 standalone 环境中ocssd.bin crash/panic或者被手动KILL掉,都不会导致节点重启:
[oracle@mlab1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE mlab1 ora.FRA.dg ONLINE ONLINE mlab1 ora.LISTENER.lsnr ONLINE ONLINE mlab1 ora.asm ONLINE ONLINE mlab1 Started ora.ons OFFLINE OFFLINE mlab1 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.cssd 1 ONLINE ONLINE mlab1 ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE ONLINE mlab1 ora.proda.db 1 ONLINE ONLINE mlab1 Open 首先把CSSD的LOG LEVEL升到2,以便获得更多的CSSD日志 [oracle@mlab1 ~]$ crsctl debug log css CSSD:2 CRS-4151: DEPRECATED: use crsctl set log {css|crs|evm} Set CSSD Module: CSSD Log Level: 2 在11g中可以使用crsctl set log css语法来替代crsctl debug log了 [oracle@mlab1 ~]$ crsctl set log css CSSD:2 Set CSSD Module: CSSD Log Level: 2 [oracle@mlab1 ~]$ crsctl get log css CSSD Get CSSD Module: CSSD Log Level: 2 oracle 17797 1 0 Oct19 ? 00:00:11 /g01/oracle/app/oracle/product/11.2.0/grid/bin/ocssd.bin oracle 29016 28865 0 21:47 pts/1 00:00:00 grep cssd.bin [oracle@mlab1 ~]$ kill -9 17797 [oracle@mlab1 ~]$ ps -ef|grep cssd.bin oracle 29128 1 0 21:48 ? 00:00:00 /g01/oracle/app/oracle/product/11.2.0/grid/bin/ocssd.bin oracle 29144 28865 0 21:49 pts/1 00:00:00 grep cssd.bin [oracle@mlab1 ~]$ uptime 21:49:13 up 28 days, 22:24, 3 users, load average: 0.16, 0.06, 0.01 tail -f ocssd.log 2012-10-21 09:45:06.853: [ CSSD][1105594688]clssgmClientConnectMsg: properties of cmProc 0x7f270c1617e0 - 1,2,3,4,5 2012-10-21 09:45:06.853: [ CSSD][1105594688]clssgmClientConnectMsg: Connect from con(0x20b8) proc(0x7f270c1617e0) pid(28935) version 11:2:1:4, properties: 1,2,3,4,5 2012-10-21 09:45:06.853: [ CSSD][1105594688]clssgmClientConnectMsg: msg flags 0x0000 2012-10-21 09:45:06.856: [ CSSD][1105594688]clssgmDeadProc: proc 0x7f270c1617e0 2012-10-21 09:45:06.856: [ CSSD][1105594688]clssgmDestroyProc: cleaning up proc(0x7f270c1617e0) con(0x20b8) skgpid ospid 28935 with 0 clients, refcount 0 2012-10-21 09:45:06.856: [ CSSD][1105594688]clssgmDiscEndpcl: gipcDestroy 0x20b8 2012-10-21 09:48:57.641: [ CSSD][2632525536]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0 2012-10-21 09:48:57.641: [ CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0 2012-10-21 09:48:57.642: [ CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0 2012-10-21 09:48:57.642: [ CSSD][2632525536]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0 2012-10-21 09:48:57.642: [ CSSD][2632525536]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0 2012-10-21 09:48:57.642: [ CSSD][2632525536]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0 2012-10-21 09:48:57.642: [ CSSD][2632525536]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0 [ CSSD][2632525536]clsugetconf : Configuration type [3]. 2012-10-21 09:48:57.642: [ CSSD][2632525536]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (local-only) mode with uniqueness value 1350827337 2012-10-21 09:48:57.642: [ CSSD][2632525536]clssscmain: Environment is production 2012-10-21 09:48:57.642: [ CSSD][2632525536]clssscmain: Core file size limit extended 2012-10-21 09:48:57.654: [ CSSD][2632525536]clssscmain: GIPCHA down 0 2012-10-21 09:48:57.655: [ CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21 2012-10-21 09:48:57.655: [ CSSD][2632525536]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536 2012-10-21 09:48:57.655: [ CSSD][2632525536]clssscExtendLimits: The current soft limit for locked memory is 3955359744, hard limit is 3955359744 2012-10-21 09:48:57.655: [ CSSD][2632525536]clssscmain: Running as user oracle 2012-10-21 09:48:57.656: [ CSSD][2632525536]clssscmain: RT queue setting is at default value 2012-10-21 09:48:57.657: [ CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter auth rep (9) failed with rc 21 2012-10-21 09:48:57.657: [ CSSD][2632525536]clssgmInitCMInfoMin: clsmonJoined set via localonly [ clsdmt][1097894208]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=mlab1DBG_CSSD)) 2012-10-21 09:48:57.658: [ clsdmt][1097894208]PID for the Process [29128], connkey 4 2012-10-21 09:48:57.658: [ CSSD][2632525536]clssscGetParameterOLR: OLR fetch for parameter diagwait (14) failed with rc 21 2012-10-21 09:48:57.662: [ CSSD][2632525536]clssnmInitNMInfoMin: Initializing first-reconfig to (0) 2012-10-21 09:48:57.662: [ CSSD][2632525536]clssscmain: initgminfo done 2012-10-21 09:48:57.662: [ CSSD][1082157376]clssgmclientlsnr: Spawned 2012-10-21 09:48:57.662: [ CSSD][1082157376]clssgmEvtInformation: reqtype (13) cmProc ((nil)) client ((nil)) 2012-10-21 09:48:57.662: [ CSSD][1082157376]clssgmEvtInformation: reqtype (13) req (0x12e9900) 2012-10-21 09:48:57.663: [ CSSD][1082157376]clssgmclientlsnr: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_mlab1_)(GIPCID=00000000-00000000-29128)) 2012-10-21 09:48:57.665: [ CSSD][2632525536]clssscmain:read clusterguid 5f0de5b55b586f17bfc26fd1c7c638a0 from OLR 2012-10-21 09:48:57.665: [ CSSD][2632525536]clssscmain: Cluster GUID is 5f0de5b55b586f17bfc26fd1c7c638a0 2012-10-21 09:48:57.665: [ CSSD][2632525536]clssnmNotifyReq: type (12) 2012-10-21 09:48:57.665: [ CSSD][2632525536]clssscmain: Skipping voting device init for local_only 2012-10-21 09:48:57.665: [ CSSD][2632525536]clssnmInitNodeDB: Initializing with OCR id 0 2012-10-21 09:48:58.612: [ CSSD][1082157376]clssscSelect: cookie accept request 0x7fe79802a2d0 2012-10-21 09:48:58.612: [ CSSD][1082157376]clssgmAllocProc: (0x139f4f0) allocated 2012-10-21 09:48:58.612: [ CSSD][1082157376]clssgmClientConnectMsg: properties of cmProc 0x139f4f0 - 1,2,3,4,5 2012-10-21 09:48:58.613: [ CSSD][1082157376]clssgmClientConnectMsg: Connect from con(0xd2) proc(0x139f4f0) pid(29114) version 11:2:1:4, properties: 1,2,3,4,5 2012-10-21 09:48:58.613: [ CSSD][1082157376]clssgmClientConnectMsg: The CSSD agent is process (0x139f4f0), number 1 2012-10-21 09:48:58.613: [ CSSD][1082157376]clssgmEvtInformation: reqtype (11) cmProc (0x139f4f0) client ((nil)) 2012-10-21 09:48:58.613: [ CSSD][1082157376]clssgmEvtInformation: reqtype (11) req (0x139a730) 2012-10-21 09:48:58.613: [ CSSD][1082157376]clssnmQueueNotification: type (11) 0x139a730 2012-10-21 09:48:58.665: [ CSSD][1078692160]clssnm_skgxnmon: Compatible vendor clusterware not in use 2012-10-21 09:48:58.665: [ CSSD][2632525536]clssnmNotifyReq: type (20) 2012-10-21 09:48:58.665: [ CSSD][2632525536][INFO]clssnmInitNodeDB: local only, no IPMI allowed 2012-10-21 09:48:58.665: [ CSSD][1078692160]clssgmDeathChkThread: Spawned 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmNotifyReq: type (11) 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmNotifyReq: found 1 for type (11) 0x139a730 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmCompleteGMReq: Completed request type 11 with status 1 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssgmDoneQEle: re-queueing req 0x12e5390 status 1 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmNotifyReq: type (13) 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmNotifyReq: found 1 for type (13) 0x12e9900 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssnmCompleteGMReq: Completed request type 13 with status 1 2012-10-21 09:48:58.666: [ CSSD][2632525536]clssgmDoneQEle: re-queueing req 0x12cdb80 status 1 2012-10-21 09:48:58.666: [ CSSD][1083734336]clssgmPeerListener: Spawned for node 2012-10-21 09:48:58.667: [ CSSD][1083734336]clssgmPeerListener: physical hostname mlab1 privname mlab1 2012-10-21 09:48:58.667: [ CSSD][1083734336]clssgmPeerListener: gipc addr gipc://mlab1:gm_ 2012-10-21 09:48:58.667: [ CSSD][1083734336]clssgmPeerListener: gipc addr gipcha://mlab1:gm2_ 2012-10-21 09:48:58.667: [ CSSD][1082157376]clssgmclientOpenEndp: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_localhost_1)(GIPCID=00000000-00000000-29128)) 2012-10-21 09:48:58.667: [ CSSD][1100257600]clssnmPollingThread: Spawned, poll interval 1000 2012-10-21 09:48:58.667: [ CSSD][1107290432]clssnmRcfgMgrThread: Spawned 2012-10-21 09:48:58.667: [ CSSD][1108867392]clssnmClusterListener: Spawned 2012-10-21 09:48:58.667: [ CSSD][1108867392]clssnmOpenEndp: Not opening endp for localonly mode 2012-10-21 09:48:58.668: [ CSSD][1082157376]clssgmclientOpenEndp: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_mlab1_localhost)(GIPCID=00000000-00000000-29128)) 2012-10-21 09:48:58.668: [ CSSD][1082157376]clssgmCheckReqNMCompletion: Completing request type 11 for proc (0x139f4f0), operation status 1, client status 0 2012-10-21 09:48:58.668: [ CSSD][1091864896]clssnmSendingThread: Spawned 2012-10-21 09:48:58.669: [ CSSD][1082157376]clssgmEvtInformation: reqtype (22) cmProc (0x139f4f0) client ((nil)) 2012-10-21 09:48:58.669: [ CSSD][1082157376]clssgmEvtInformation: reqtype (22) req (0x7fe798207480) 2012-10-21 09:48:58.669: [ CSSD][1082157376]clssnmQueueNotification: type (22) 0x7fe798207480 2012-10-21 09:48:59.667: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:00.667: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:01.667: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:02.667: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:03.667: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:04.668: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:05.668: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmRcfgMgrThread: Local Join 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmLocalJoinEvent: begin on node(1), waittime 193000 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmLocalJoinEvent: scanning 2 nodes 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmLocalJoinEvent: Starting initial cluster reconfig 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmDoSyncUpdate: Initiating sync 0 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssscCompareSwapEventValue: changed NMReconfigInProgress val 1, from -1, changes 1 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 09:49:06.168: [ CSSD][1107290432]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed. 2012-10-21 09:49:06.170: [ CSSD][1107290432]clssnmSetFirstIncarn: got incarnation 243778226 from OLR 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmSetFirstIncarn: Incarnation set to 243778227 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 243778227 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmSetupAckWait: Ack message type (11) 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmSetupAckWait: node(1) is ALIVE 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227), indicating EXADATA fence initialization incomplete 2012-10-21 09:49:06.174: [ CSSD][1107290432]List of nodes that have ACKed my sync: NULL 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227) 2012-10-21 09:49:06.174: [ CSSD][1107290432]clssnmWaitForAcks: Ack message type(11), ackCount(1) 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssscUpdateEventValue: NMReconfigInProgress val 1, changes 2 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmHandleSync: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmHandleSync: initleader 1 newleader 1 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 0 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmQueueClientEvent: Node[1] state = 1, birth = 0, unique = 1350827337 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmHandleSync: Acknowledging sync: src[1] srcName[mlab1] seq[1] sync[243778227] 2012-10-21 09:49:06.174: [ CSSD][1108867392]clssnmSendAck: node 1, mlab1, syncSeqNo(243778227) type(11) 2012-10-21 09:49:06.174: [ CSSD][2632525536]NMEVENT_SUSPEND [00][00][00][00] 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmUpdateEventValue: CmInfo State val 5, changes 1 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmSuspendAllGrocks: Issue SUSPEND 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmSuspendAllGrocks: done 2012-10-21 09:49:06.175: [ CSSD][1108867392]clssnmHandleAck: src[1] dest[1] dom[0] seq[0] sync[243778227] type[11] ackCount(0) 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmUpdateEventValue: CmInfo State val 2, changes 2 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmUpdateEventValue: ConnectedNodes val 0, changes 1 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(0) 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmCleanupNodeContexts(): successful cleanup of nodes rcfg(0) 2012-10-21 09:49:06.175: [ CSSD][2632525536]clssgmStartNMMon: completed node cleanup 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmSendSync: syncSeqNo(243778227), indicating EXADATA fence initialization incomplete 2012-10-21 09:49:06.175: [ CSSD][1107290432]List of nodes that have ACKed my sync: 1 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmWaitForAcks: done, syncseq(243778227), msg type(11) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmSetMinMaxVersion:node1 product/protocol (11.2/1.4) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmNeedConfReq: No configuration to change 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmDoSyncUpdate: node(1) is transitioning from joining state to active state 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmDoSyncUpdate: Wait for 0 vote ack(s) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmCheckDskInfo: Checking disk info... 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmRemove: Start 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmWaitOnEvictions: Start 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmBldSendUpdate: syncSeqNo(243778227) 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmBldSendUpdate: using msg version 4 2012-10-21 09:49:06.175: [ CSSD][1107290432]clssnmDoSyncUpdate: Sync 243778227 complete! 2012-10-21 09:49:06.175: [ CSSD][1108867392]clssnmHandleUpdate: sync[243778227] src[1], msgvers 4 icin 243778227 2012-10-21 09:49:06.175: [ CSSD][1108867392]clssnmHandleUpdate: common properties are 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 2012-10-21 09:49:06.175: [ CSSD][1108867392]clssscSAGEInitFencing: kgzf fence initialization starting with GUID 5f0de5b55b586f17bfc26fd1c7c638a0, icin 243778227, node number 1, uniqueness 243778227 2012-10-21 09:49:06.176: [ default][1108867392]CELL communication is configured to use 0 interface(s): 2012-10-21 09:49:06.176: [ default][1108867392]Kgzf_ini_begin: diskmon is disabled 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssscSAGEInitFencing: kgzf fence initialization successfully started 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssnmUpdateNodeState: node mlab1, number 1, current state 2, proposed state 3, current unique 1350827337, proposed unique 1350827337, prevConuni 0, birth 243778227 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssnmSendAck: node 1, mlab1, syncSeqNo(243778227) type(15) 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssnmQueueClientEvent: Sending Event(1), type 1, incarn 243778227 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssnmQueueClientEvent: Node[1] state = 3, birth = 243778227, unique = 1350827337 2012-10-21 09:49:06.176: [ CSSD][1108867392]clssnmHandleUpdate: SYNC(243778227) from node(1) completed 2012-10-21 09:49:06.177: [ CSSD][2632525536]clssgmStartNMMon: node 1 active, birth 243778227 2012-10-21 09:49:06.177: [ CSSD][1108867392]clssnmHandleUpdate: NODE 1 (mlab1) IS ACTIVE MEMBER OF CLUSTER 2012-10-21 09:49:06.177: [ CSSD][2632525536]clssgmUpdateEventValue: Reconfig Event val 1, changes 1 2012-10-21 09:49:06.177: [ CSSD][1108867392]clssscUpdateEventValue: NMReconfigInProgress val -1, changes 3 2012-10-21 09:49:06.177: [ CSSD][1108867392]clssnmHandleUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 09:49:06.177: [ CSSD][2632525536]clssgmUpdateEventValue: CmInfo State val 3, changes 3 2012-10-21 09:49:06.177: [ CSSD][1083734336]clssgmWaitOnEventValue: after CmInfo State val 3, eval 3 waited 510 2012-10-21 09:49:06.177: [ CSSD][1083734336]clssgmUpdateEventValue: HoldRequest val 1, changes 1 2012-10-21 09:49:06.177: [ CSSD][1085311296]clssgmReconfigThread: started for reconfig (243778227) 2012-10-21 09:49:06.177: [ CSSD][1085311296]NMEVENT_RECONFIG [00][00][00][02]
可以看到在standalone环境下的ocssd.bin crash/killed都不会造成节点重启,因为是非cluster环境,所以reboot确实不需要,而仅仅是重启一个ocssd.bin进程。
而在RAC cluster环境中则不一样了:
[root@vrh1 ~]# crsctl set log css CSSD:2 Set CSSD Module: CSSD Log Level: 2 [root@vrh1 ~]# [root@vrh1 ~]# ps -ef|grep ocssd.bin grid 3929 1 1 Oct19 ? 00:57:01 /g01/11.2.0/grid/bin/ocssd.bin root 27297 26827 0 10:00 pts/0 00:00:00 grep ocssd.bin [root@vrh1 ~]# kill -19 3929 signal 19是SIGSTOP 被KILL -19 cssd.bin的vrh1节点的ocssd.log 2012-10-21 10:01:31.068: [ CSSD][1086265664]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091 2012-10-21 10:01:31.068: [ CSSD][1103239488]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091 2012-10-21 10:01:31.068: [ CSSD][1092921664]clssnmvDiskPing: Writing with status 0x3, timestamp 199229204/1350828091 2012-10-21 10:01:31.195: [ CSSD][1096075584]clssnmvDiskPing: Writing with status 0x3, timestamp 199229324/1350828091 2012-10-21 10:01:31.196: [ CSSD][1111820608]clssnmvDiskPing: Writing with status 0x3, timestamp 199229324/1350828091 2012-10-21 10:01:31.220: [ CSSD][1113397568]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskh flags 0x00000000, kill block unique 0, my unique 1350627944 2012-10-21 10:01:31.220: [ CSSD][1099512128]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskg flags 0x00000000, kill block unique 0, my unique 1350627944 2012-10-21 10:01:31.220: [ CSSD][1077381440]clssnmvDiskKillCheck: not evicted, file /dev/asm-diski flags 0x00000000, kill block unique 0, my unique 1350627944 因为ocssd.bin进程absent,日志从10:01:31.220后未更新 另一节点上的ocssd.log: 2012-10-21 10:01:45.623: [ CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 50% heartbeat fatal, removal in 14.940 seconds 2012-10-21 10:01:45.623: [ CSSD][1124575552]clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 2491404, misstime 15060 2012-10-21 10:01:45.625: [ CSSD][1085704512]clssnmvDHBValidateNCopy: node 1, vrh1, has a disk HB, but no network HB, DHB has rcfg 239944581, wrtcnt, 9550714, LATS 344798954, lastSeqNo 7825639, uniqueness 1350627944, timestamp 1350828091/199229204 2012-10-21 10:01:45.625: [ CSSD][1089829184]clssnmvDHBValidateNCopy: node 1, vrh1, has a disk HB, but no network HB, DHB has rcfg 239944581, wrtcnt, 9550715, LATS 344798954, lastSeqNo 6103632, uniqueness 1350627944, timestamp 1350828091/199229324 2012-10-21 10:01:45.684: [ CSSD][1107228992]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskk flags 0x00000000, kill block unique 0, my unique 1350482491 2012-10-21 10:01:46.105: [ CSSD][1111959872]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskh flags 0x00000000, kill block unique 0, my unique 1350482491 2012-10-21 10:01:46.105: [ CSSD][1118267712]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskj flags 0x00000000, kill block unique 0, my unique 1350482491 2012-10-21 10:01:46.105: [ CSSD][1108805952]clssnmvDiskKillCheck: not evicted, file /dev/asm-diski flags 0x00000000, kill block unique 0, my unique 1350482491 2012-10-21 10:01:46.105: [ CSSD][1116690752]clssnmvDiskKillCheck: not evicted, file /dev/asm-diskg flags 0x00000000, kill block unique 0, my unique 1350482491 2012-10-21 10:01:46.587: [ CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes 2012-10-21 10:01:46.587: [ CSSD][1126152512]clssnmSendingThread: sent 8 status msgs to all nodes 2012-10-21 10:01:53.627: [ CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 75% heartbeat fatal, removal in 6.930 seconds 2012-10-21 10:01:55.102: [ CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes 2012-10-21 10:01:55.102: [ CSSD][1126152512]clssnmSendingThread: sent 8 status msgs to all nodes 2012-10-21 10:01:57.628: [ CSSD][1124575552]clssnmPollingThread: node vrh1 (1) at 90% heartbeat fatal, removal in 2.930 seconds, seedhbimpd 1 2012-10-21 10:01:59.608: [ CSSD][1126152512]clssnmSendingThread: sending status msg to all nodes 2012-10-21 10:01:59.608: [ CSSD][1126152512]clssnmSendingThread: sent 9 status msgs to all nodes 2012-10-21 10:02:00.560: [ CSSD][1124575552]clssnmPollingThread: Removal started for node vrh1 (1), flags 0x26040c, state 3, wt4c 0 2012-10-21 10:02:00.560: [ CSSD][1124575552]clssnmMarkNodeForRemoval: node 1, vrh1 marked for removal 2012-10-21 10:02:00.560: [ CSSD][1124575552]clssnmDiscHelper: vrh1, node(1) connection failed, endp (0x98f947), probe((nil)), ninf->endp 0x98f947 2012-10-21 10:02:00.560: [ CSSD][1124575552]clssnmDiscHelper: node 1 clean up, endp (0x98f947), init state 5, cur state 5 2012-10-21 10:02:00.560: [GIPCXCPT][1124575552] gipcInternalDissociate: obj 0x7f3bc80d8020 [000000000098f947] { gipcEndpoint : localAddr 'gipcha://vrh2:nm2_vrh-cluster/9dc0-9546-c12b-e74', remoteAddr 'gipcha://vrh1:2cf2-b3ca-7399-111', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x138606, usrFlags 0x0 } not associated with any container, ret gipcretFail (1) 2012-10-21 10:02:00.560: [GIPCXCPT][1124575552] gipcDissociateF [clssnmDiscHelper : clssnm.c : 3436]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x7f3bc80d8020 [000000000098f947] { gipcEndpoint : localAddr 'gipcha://vrh2:nm2_vrh-cluster/9dc0-9546-c12b-e74', remoteAddr 'gipcha://vrh1:2cf2-b3ca-7399-111', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x138606, usrFlags 0x0 }, flags 0x0 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmRcfgMgrThread: Reconfig in progress... 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmRcfgMgrThread: sync leader(1) failed, misstime(30000) unique(1350627944) 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress val -1, from 1, changes 18 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmDoSyncUpdate: Initiating sync 239944581 2012-10-21 10:02:00.560: [ CSSD][1129306432]clssnmDiscEndp: gipcDestroy 0x98f947 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress val 2, from -1, changes 19 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27000 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed. 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 239944581 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmSetupAckWait: Ack message type (11) 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmSetupAckWait: node(2) is ALIVE 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581), indicating EXADATA fence initialization complete 2012-10-21 10:02:00.560: [ CSSD][1127729472]List of nodes that have ACKed my sync: NULL 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581) 2012-10-21 10:02:00.560: [ CSSD][1127729472]clssnmWaitForAcks: Ack message type(11), ackCount(1) 2012-10-21 10:02:00.560: [ CSSD][1129306432]clssnmHandleSync: Node vrh2, number 2, is EXADATA fence capable 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress val 2, changes 20 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmHandleSync: local disk timeout set to 27000 ms, remote disk timeout set to 27000 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmHandleSync: initleader 2 newleader 2 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 239944580 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 5, birth = 239944580, unique = 1350627944 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[16] sync[239944581] 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944581) type(11) 2012-10-21 10:02:00.561: [ CSSD][1129306432]clssnmHandleAck: src[2] dest[2] dom[0] seq[0] sync[239944581] type[11] ackCount(0) 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmStartNMMon: node 1 active, birth 239944580 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmStartNMMon: node 2 active, birth 239944577 2012-10-21 10:02:00.561: [ CSSD][3611592416]NMEVENT_SUSPEND [00][00][00][06] 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmUpdateEventValue: CmInfo State val 5, changes 50 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmSuspendAllGrocks: Issue SUSPEND 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSendSync: syncSeqNo(239944581), indicating EXADATA fence initialization complete 2012-10-21 10:02:00.561: [ CSSD][1127729472]List of nodes that have ACKed my sync: 2 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(IG+ASMSYS$USERS) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x0, state 0x0 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmWaitForAcks: done, syncseq(239944581), msg type(11) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSetMinMaxVersion:node2 product/protocol (11.2/1.4) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(crs_version) count(2) master(2) event(2), incarn 6, mbrc 2, to member 2, events 0x0, state 0x0 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmNeedConfReq: No configuration to change 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(crs_version) count(2) master(2) event(2), incarn 6, mbrc 2, to member 0, events 0x20, state 0x0 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(30000) state(5) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmDoSyncUpdate: Wait for 0 vote ack(s) 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmCheckDskInfo: Checking disk info... 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmRemove: Start 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(CRF-) count(3) master(2) event(2), incarn 11, mbrc 3, to member 2, events 0x38, state 0x0 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmrRemoveNode: Removing node 1, vrh1, from the cluster in incarnation 239944581, node birth incarnation 239944580, death incarnation 239944581, stateflags 0x260000 uniqueness value 1350627944 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(CLSN.ONSPROC.MASTER) count(1) master(2) event(2), incarn 1, mbrc 1, to member 2, events 0xa0, state 0x0 2012-10-21 10:02:00.561: [ default][1127729472]kgzf_gen_node_reid2: generated reid cid=58a8249042c37f94bf844767ea0ae255,icin=239944576,nmn=1,lnid=239944580,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node hdr=0xfece0100 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmrFenceSage: Fenced node vrh1, number 1, with EXADATA, handle 0 2012-10-21 10:02:00.561: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DB+ASM) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x68, state 0x0 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmSendShutdown: req to node 1, kill time 344813894 2012-10-21 10:02:00.561: [ CSSD][1127729472]clssnmsendmsg: not connected to node 1 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmSendShutdown: Send to node 1 failed 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmWaitOnEvictions: Start 2012-10-21 10:02:00.562: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG+ASM) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x0, state 0x0 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmWaitOnEvictions: node 1, undead 1, EXADATA fence handle 0 kill reqest id 0, last DHB (1350828091, 199229324, 399572), seedhbimpd TRUE 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmCheckKillStatus: Node 1, vrh1, down, LATS(344798954),timeout(14940) 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmBldSendUpdate: syncSeqNo(239944581) 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmBldSendUpdate: using msg version 4 2012-10-21 10:02:00.562: [ CSSD][1127729472]clssnmDoSyncUpdate: Sync 239944581 complete! 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmHandleUpdate: sync[239944581] src[2], msgvers 4 icin 239944576 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmHandleUpdate: common properties are 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmUpdateNodeState: node vrh1, number 1, current state 5, proposed state 0, current unique 1350627944, proposed unique 1350627944, prevConuni 1350627944, birth 239944580 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmDeactivateNode: node 1, state 5 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmDeactivateNode: node 1 (vrh1) left cluster 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(IG+ASMSYS$BACKGROUND) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x0, state 0x0 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmUpdateNodeState: node vrh2, number 2, current state 3, proposed state 3, current unique 1350482491, proposed unique 1350482491, prevConuni 0, birth 239944577 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(VT+ASM) count(2) master(2) event(2), incarn 12, mbrc 2, to member 2, events 0x60, state 0x0 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944581) type(15) 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmQueueClientEvent: Sending Event(1), type 1, incarn 239944581 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG+ASM0) count(2) master(1) event(2), incarn 4, mbrc 2, to member 1, events 0x0, state 0x0 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 0, birth = 0, unique = 0 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmHandleUpdate: SYNC(239944581) from node(2) completed 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress val -1, changes 21 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmHandleUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(GR+GCR1) count(2) master(1) event(2), incarn 6, mbrc 2, to member 1, events 0x280, state 0x0 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmStartPendingConfigChange: New configuration request for CIN 0:1350628006:0 2012-10-21 10:02:00.563: [ CSSD][1129306432] misscount 30 reboot latency 3 2012-10-21 10:02:00.563: [ CSSD][1129306432] long I/O timeout 200 short I/O timeout 27 2012-10-21 10:02:00.563: [ CSSD][1129306432] diagnostic wait 13 active version 11.2.0.3.0 2012-10-21 10:02:00.563: [ CSSD][1129306432] Listing unique IDs for 5 voting files: 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_DATA) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0 x0 2012-10-21 10:02:00.563: [ CSSD][1129306432] voting file 1: 85edc0e8-2d274f78-bfc58cdc-73b8c68a 2012-10-21 10:02:00.563: [ CSSD][1129306432] voting file 2: 201ffffc-8ba44faa-bfe2efec-2aa75840 2012-10-21 10:02:00.563: [ CSSD][1129306432] voting file 3: 6f2a25c5-89964faa-bf6980f7-c5f621ce 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(ocr_vrh-cluster) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x78, state 0x0 2012-10-21 10:02:00.563: [ CSSD][1129306432] voting file 4: 93eb3156-48454f25-bf3717df-1a2c73d5 2012-10-21 10:02:00.563: [ CSSD][1129306432] voting file 5: 37372406-78964f88-bfbfbd31-d8b3829f 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmStartPendingConfigChange: Initiating configuration change reconfig for CIN 1350628006 2012-10-21 10:02:00.563: [ CSSD][1129306432]clssnmStartCINUpdate: Starting CIN update for type 8 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(CLSFRAME) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(CRSDMAIN) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0 2012-10-21 10:02:00.563: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(EVMDMAIN) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0 2012-10-21 10:02:00.564: [ CSSD][1119844672]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(EVMDMAIN2) count(1) master(2) event(2), incarn 3, mbrc 1, to member 2, events 0x8, state 0x0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(CTSSGROUP) count(2) master(2) event(2), incarn 4, mbrc 2, to member 2, events 0x8, state 0x0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_BACKUPDG) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmQueueGrockEvent: groupName(DG_SYSTEMDG) count(1) master(1) event(2), incarn 3, mbrc 1, to member 1, events 0x4, state 0x0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmSuspendAllGrocks: done 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmUpdateEventValue: CmInfo State val 2, changes 51 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmUpdateEventValue: ConnectedNodes val 239944580, changes 18 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(239944580) 2012-10-21 10:02:00.564: [ CSSD][1096816960]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmCleanupNodeContexts(): successful cleanup of nodes rcfg(239944580) 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmStartNMMon: completed node cleanup 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmStartNMMon: node 1 failed, birth (239944580, 0) (old/new) 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmStartNMMon: node 2 active, birth 239944577 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmUpdateEventValue: Reconfig Event val 1, changes 13 2012-10-21 10:02:00.564: [ CSSD][3611592416]clssgmUpdateEventValue: CmInfo State val 3, changes 52 2012-10-21 10:02:00.564: [ CSSD][1113536832]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0 2012-10-21 10:02:00.564: [ CSSD][1085704512]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0 2012-10-21 10:02:00.565: [ CSSD][1089829184]clssnmvDiskEvict: preconuni is NULL skipping the eviction write for node 1, state 0 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmReconfigThread: started for reconfig (239944581) 2012-10-21 10:02:00.565: [ CSSD][1130883392]NMEVENT_RECONFIG [00][00][00][04] 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmWaitOnEventValue: after HoldRequest val 1, eval 1 waited 0 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State val 4, from 3, changes 53 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(239944580) 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmDisconnectNodes: Closing connection 0x98fa69 for node vrh1, number 1, in incarnation 239944581; state flags 0x80000001, conn state flags 0x000a 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmCleanupNodeContexts(): successful cleanup of nodes rcfg(239944581) 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmUpdateEventValue: ReadyPeers val 1, changes 7 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State val 6, from 4, changes 54 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmEstablishConnections: 1 nodes in cluster incarn 239944581 2012-10-21 10:02:00.565: [ CSSD][1130883392]clssgmUpdateEventValue: ConnectedNodes val 0, changes 19 2012-10-21 10:02:00.565: [ CSSD][1085704512]clssnmCINUpdateComplete: Pending CIN update completed, config state 1 2012-10-21 10:02:00.566: [ CSSD][1122998592]clssgmPeerListener: new incarn 239944581. old 239944580 2012-10-21 10:02:00.566: [ CSSD][1124575552]clssnmPollingThread: signaling reconfig for config change 2012-10-21 10:02:00.566: [ CSSD][1122998592]clssgmPeerDeactivate: node 1 (vrh1), death 239944581, state 0x80000000 connstate 0xa 2012-10-21 10:02:00.566: [ GIPCLIB][1122998592] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x183d130, idxPtr:0x7f3bd6f6f8c0, key:0x42ef7140, flags:0x0 2012-10-21 10:02:00.566: [GIPCXCPT][1122998592] gipcObjectLookupF [gipcDissociateF : gipc.c : 2175]: search found no matching oid 0000000000000000, ret gipcretKeyNotFound (36), ret gipcretInvalidObject (3) 2012-10-21 10:02:00.566: [GIPCXCPT][1122998592] gipcDissociateF [clssgmPeerDeactivate : clssgmp.c : 3525]: EXCEPTION[ ret gipcretInvalidObject (3) ] failed to dissociate obj 0000000000000000, flags 0x0 2012-10-21 10:02:00.566: [ CSSD][1122998592]clssgmCleanFuture: discarded 0 future msgs for 1 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmDoSyncUpdate: Initiating sync 239944582 2012-10-21 10:02:00.566: [ CSSD][1122998592]clssgmPeerListener: connects done (1/1) 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress val 2, from -1, changes 22 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssscCompareSwapEventValue: changed NMReconfigInProgress val 2, from -1, changes 22 2012-10-21 10:02:00.566: [ CSSD][1122998592]clssgmUpdateEventValue: ConnectedNodes val 239944581, changes 20 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed. 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 239944582 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmSetupAckWait: Ack message type (11) 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmSetupAckWait: node(2) is ALIVE 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmSendSync: syncSeqNo(239944582), indicating EXADATA fence initialization complete 2012-10-21 10:02:00.566: [ CSSD][1127729472]List of nodes that have ACKed my sync: NULL 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmSendSync: syncSeqNo(239944582) 2012-10-21 10:02:00.566: [ CSSD][1127729472]clssnmWaitForAcks: Ack message type(11), ackCount(1) 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmWaitChangeEventValue: ev(ConnectedNodes) changed to 239944581 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmEstablishConnections: Sending STATUS message to all nodes for incarnation 239944581 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmEstablishConnections: (1/1) connected, incarn(239944581) 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State val 7, from 6, changes 55 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmHandleSync: Node vrh2, number 2, is EXADATA fence capable 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssscUpdateEventValue: NMReconfigInProgress val 2, changes 23 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmHandleSync: local disk timeout set to 200000 ms, remote disk timeout set to 200000 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmSetVersions: properties common to all peers: 1,2,3,4,5,6,7,8,9,10,11 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmHandleSync: initleader 2 newleader 2 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmEstablishMasterNode: MASTER for 239944581 is node(2) birth(239944577) 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 239944581 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State val 8, from 7, changes 56 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmQueueClientEvent: Node[1] state = 0, birth = 0, unique = 0 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmMasterCMSync: Synchronizing group/lock status, replay-mode=0 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmQueueClientEvent: Node[2] state = 3, birth = 239944577, unique = 1350482491 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmCompareSwapEventValue: changed CmInfo State val 9, from 8, changes 57 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[20] sync[239944582] 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmMasterCMSync: processing grock(IG+ASMSYS$USERS) type(2) 2012-10-21 10:02:00.566: [ CSSD][1130883392]clssgmCleanupOrphanMembers: orphan member(1/IG+ASMSYS$USERS), birth(239944580) on node(1), birth(0/239944581) 2012-10-21 10:02:00.566: [ CSSD][1129306432]clssnmSendAck: node 2, vrh2, syncSeqNo(239944582) type(11)
以上使用KILL -19 SIGSTOP cssd.bin进程也造成节点重启。
接着我们尝试用KILL -9 cssd.bin,节点同样重启:
[root@vrh1 ~]# ps -ef|grep ocssd grid 3900 1 1 10:03 ? 00:00:37 /g01/11.2.0/grid/bin/ocssd.bin grid 6019 4287 0 10:39 pts/1 00:00:00 tail -f ocssd.log root 6028 4331 0 10:39 pts/0 00:00:00 grep ocssd [root@vrh1 ~]# kill -9 3900 2012-10-21 10:39:22.075: [ CSSD][1121757504]clssnmPollingThread: signaling reconfig for config change 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0 2012-10-21 10:40:49.822: [ CSSD][1441830624]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0 [ CSSD][1441830624]clsugetconf : Configuration type [4]. 2012-10-21 10:40:49.823: [ CSSD][1441830624]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1350830449 2012-10-21 10:40:49.823: [ CSSD][1441830624]clssscmain: Environment is production 2012-10-21 10:40:49.823: [ CSSD][1441830624]clssscmain: Core file size limit extended 2012-10-21 10:40:49.835: [ CSSD][1441830624]clssscmain: GIPCHA down 0
Comment