测试Exadata单个cell失败

测试Exadata单个cell失败,模拟X2-2 3个cellserver中单个cell丢失的情况:

cell server


CellCLI> alter cell shutdown services all

Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.




ASM LOG:


Sun Sep 02 09:08:48 2012
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_gmon_30756.trc:
ORA-27603: Cell storage I/O error, I/O failed on disk o/192.168.64.131/DATA_DM01_CD_00_dm01cel01 at offset 8384512 for data length 4096
ORA-27626: Exadata error: 12 (Network error)
ORA-27300: OS system dependent operation:rpc update timed out failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxp_path
WARNING: Write Failed. group:1 disk:0 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 0.3940753198 (DATA_DM01_CD_00_DM01CEL01) in group 1 failed. 
WARNING: Write Failed. group:2 disk:0 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 0.3940753233 (DBFS_DG_CD_02_DM01CEL01) in group 2 failed. 
WARNING: Write Failed. group:3 disk:0 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 0.3940753265 (RECO_DM01_CD_00_DM01CEL01) in group 3 failed. 
Sun Sep 02 09:08:48 2012
NOTE: process _b000_+asm1 (21302) initiating offline of disk 0.3940753198 (DATA_DM01_CD_00_DM01CEL01) with mask 0x7e in group 1
NOTE: checking PST: grp = 1
Sun Sep 02 09:08:48 2012
NOTE: process _b001_+asm1 (21304) initiating offline of disk 0.3940753233 (DBFS_DG_CD_02_DM01CEL01) with mask 0x7e in group 2
NOTE: checking PST: grp = 2
GMON checking disk modes for group 1 at 10 for pid 30, osid 21302
Sun Sep 02 09:08:48 2012
NOTE: process _b002_+asm1 (21306) initiating offline of disk 0.3940753265 (RECO_DM01_CD_00_DM01CEL01) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
WARNING: Read Failed. group:1 disk:11 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:11 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:10 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:10 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:9 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:9 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:8 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:8 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:7 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:7 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:6 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:6 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:5 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:5 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:4 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:4 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:3 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:3 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:2 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:2 AU:1 offset:0 size:4096
WARNING: Read Failed. group:1 disk:1 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:1 disk:1 AU:1 offset:0 size:4096
WARNING: Write Failed. group:1 disk:1 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:2 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:3 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:4 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:5 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:6 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:7 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:8 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:9 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:10 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:1 disk:11 AU:1 offset:4096 size:4096
WARNING: GMON has insufficient disks to maintain consensus. minimum required is 3
NOTE: group DATA_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group DATA_DM01: updated PST location: disk 0024 (PST copy 1)
GMON checking disk modes for group 2 at 11 for pid 35, osid 21304
NOTE: checking PST for grp 1 done.
WARNING: Read Failed. group:2 disk:9 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:9 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:8 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:8 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:7 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:7 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:6 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:6 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:5 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:5 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:4 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:4 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:3 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:3 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:2 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:2 AU:1 offset:0 size:4096
WARNING: Read Failed. group:2 disk:1 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:2 disk:1 AU:1 offset:0 size:4096
WARNING: Write Failed. group:2 disk:1 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:2 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:3 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:4 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:5 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:6 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:7 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:8 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:2 disk:9 AU:1 offset:4096 size:4096
WARNING: GMON has insufficient disks to maintain consensus. minimum required is 3
NOTE: group DBFS_DG: updated PST location: disk 0010 (PST copy 0)
NOTE: group DBFS_DG: updated PST location: disk 0020 (PST copy 1)
WARNING: Disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 mode 0x7f is now being offlined
NOTE: checking PST for grp 2 done.
GMON checking disk modes for group 3 at 12 for pid 36, osid 21306
WARNING: Disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Read Failed. group:3 disk:11 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:11 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:10 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:10 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:9 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:9 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:8 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:8 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:7 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:7 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:6 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:6 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:5 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:5 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:4 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:4 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:3 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:3 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:2 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:2 AU:1 offset:0 size:4096
WARNING: Read Failed. group:3 disk:1 AU:1 offset:4096 size:4096
WARNING: Read Failed. group:3 disk:1 AU:1 offset:0 size:4096
WARNING: Write Failed. group:3 disk:1 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:2 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:3 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:4 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:5 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:6 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:7 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:8 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:9 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:10 AU:1 offset:4096 size:4096
WARNING: Write Failed. group:3 disk:11 AU:1 offset:4096 size:4096
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: checking PST for grp 3 done.
WARNING: Disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 0 (DBFS_DG_CD_02_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 0 (DBFS_DG_CD_02_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 1 (DBFS_DG_CD_03_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 1 (DBFS_DG_CD_03_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 2 (DBFS_DG_CD_04_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 2 (DBFS_DG_CD_04_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 3 (DBFS_DG_CD_05_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 3 (DBFS_DG_CD_05_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 4 (DBFS_DG_CD_06_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 4 (DBFS_DG_CD_06_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 5 (DBFS_DG_CD_07_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 5 (DBFS_DG_CD_07_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 6 (DBFS_DG_CD_08_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 6 (DBFS_DG_CD_08_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 7 (DBFS_DG_CD_09_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 7 (DBFS_DG_CD_09_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 8 (DBFS_DG_CD_10_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 8 (DBFS_DG_CD_10_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 9 (DBFS_DG_CD_11_DM01CEL01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 9 (DBFS_DG_CD_11_DM01CEL01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 2, dsk = 0/0xeae31f51, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 1/0xeae31f4a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 2/0xeae31f4c, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 3/0xeae31f4f, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 4/0xeae31f49, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 5/0xeae31f4e, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 6/0xeae31f50, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 7/0xeae31f48, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 8/0xeae31f4b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 2, dsk = 9/0xeae31f4d, mask = 0x6a, op = clear
WARNING: Disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 mode 0x7f is now being offlined
WARNING: Disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xeae31f2e, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 1/0xeae31f29, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 2/0xeae31f2f, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 3/0xeae31f28, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 4/0xeae31f33, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 5/0xeae31f2d, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 6/0xeae31f2c, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 7/0xeae31f30, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 8/0xeae31f31, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 9/0xeae31f2b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 10/0xeae31f2a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 11/0xeae31f32, mask = 0x6a, op = clear
WARNING: Disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 mode 0x7f is now being offlined
WARNING: Disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 3, dsk = 0/0xeae31f71, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xeae31f6b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xeae31f6c, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 3/0xeae31f74, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 4/0xeae31f6d, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 5/0xeae31f73, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 6/0xeae31f70, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 7/0xeae31f72, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 8/0xeae31f6e, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 9/0xeae31f6a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 10/0xeae31f6f, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 11/0xeae31f75, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 13 for pid 30, osid 21302
NOTE: group DATA_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group DATA_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group DATA_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group DATA_DM01: updated PST location: disk 0024 (PST copy 1)
GMON updating disk modes for group 2 at 14 for pid 35, osid 21304
NOTE: group DBFS_DG: updated PST location: disk 0010 (PST copy 0)
NOTE: group DBFS_DG: updated PST location: disk 0020 (PST copy 1)
NOTE: group DBFS_DG: updated PST location: disk 0010 (PST copy 0)
NOTE: group DBFS_DG: updated PST location: disk 0020 (PST copy 1)
GMON updating disk modes for group 3 at 15 for pid 36, osid 21306
NOTE: PST update grp = 1 completed successfully 
NOTE: initiating PST update: grp = 1, dsk = 0/0xeae31f2e, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 1/0xeae31f29, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 2/0xeae31f2f, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 3/0xeae31f28, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 4/0xeae31f33, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 5/0xeae31f2d, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 6/0xeae31f2c, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 7/0xeae31f30, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 8/0xeae31f31, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 9/0xeae31f2b, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 10/0xeae31f2a, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 1, dsk = 11/0xeae31f32, mask = 0x7e, op = clear
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: PST update grp = 2 completed successfully 
NOTE: initiating PST update: grp = 2, dsk = 0/0xeae31f51, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 1/0xeae31f4a, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 2/0xeae31f4c, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 3/0xeae31f4f, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 4/0xeae31f49, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 5/0xeae31f4e, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 6/0xeae31f50, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 7/0xeae31f48, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 8/0xeae31f4b, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 2, dsk = 9/0xeae31f4d, mask = 0x7e, op = clear
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 0/0xeae31f71, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xeae31f6b, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xeae31f6c, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 3/0xeae31f74, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 4/0xeae31f6d, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 16 for pid 30, osid 21302
NOTE: initiating PST update: grp = 3, dsk = 5/0xeae31f73, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 6/0xeae31f70, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 7/0xeae31f72, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 8/0xeae31f6e, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 9/0xeae31f6a, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 10/0xeae31f6f, mask = 0x7e, op = clear
NOTE: initiating PST update: grp = 3, dsk = 11/0xeae31f75, mask = 0x7e, op = clear
NOTE: group DATA_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group DATA_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group DATA_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group DATA_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: cache closing disk 0 of grp 1: DATA_DM01_CD_00_DM01CEL01
NOTE: cache closing disk 1 of grp 1: DATA_DM01_CD_01_DM01CEL01
NOTE: cache closing disk 2 of grp 1: DATA_DM01_CD_02_DM01CEL01
NOTE: cache closing disk 3 of grp 1: DATA_DM01_CD_03_DM01CEL01
NOTE: cache closing disk 4 of grp 1: DATA_DM01_CD_04_DM01CEL01
NOTE: cache closing disk 5 of grp 1: DATA_DM01_CD_05_DM01CEL01
NOTE: cache closing disk 6 of grp 1: DATA_DM01_CD_06_DM01CEL01
NOTE: cache closing disk 7 of grp 1: DATA_DM01_CD_07_DM01CEL01
NOTE: cache closing disk 8 of grp 1: DATA_DM01_CD_08_DM01CEL01
NOTE: cache closing disk 9 of grp 1: DATA_DM01_CD_09_DM01CEL01
NOTE: cache closing disk 10 of grp 1: DATA_DM01_CD_10_DM01CEL01
NOTE: cache closing disk 11 of grp 1: DATA_DM01_CD_11_DM01CEL01
GMON updating disk modes for group 2 at 17 for pid 35, osid 21304
NOTE: group DBFS_DG: updated PST location: disk 0010 (PST copy 0)
NOTE: group DBFS_DG: updated PST location: disk 0020 (PST copy 1)
NOTE: group DBFS_DG: updated PST location: disk 0010 (PST copy 0)
NOTE: group DBFS_DG: updated PST location: disk 0020 (PST copy 1)
NOTE: cache closing disk 0 of grp 2: DBFS_DG_CD_02_DM01CEL01
NOTE: cache closing disk 1 of grp 2: DBFS_DG_CD_03_DM01CEL01
NOTE: cache closing disk 2 of grp 2: DBFS_DG_CD_04_DM01CEL01
NOTE: cache closing disk 3 of grp 2: DBFS_DG_CD_05_DM01CEL01
NOTE: cache closing disk 4 of grp 2: DBFS_DG_CD_06_DM01CEL01
NOTE: cache closing disk 5 of grp 2: DBFS_DG_CD_07_DM01CEL01
NOTE: cache closing disk 6 of grp 2: DBFS_DG_CD_08_DM01CEL01
NOTE: cache closing disk 7 of grp 2: DBFS_DG_CD_09_DM01CEL01
NOTE: cache closing disk 8 of grp 2: DBFS_DG_CD_10_DM01CEL01
NOTE: cache closing disk 9 of grp 2: DBFS_DG_CD_11_DM01CEL01
NOTE: PST update grp = 1 completed successfully 
GMON updating disk modes for group 3 at 18 for pid 36, osid 21306
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: cache closing disk 0 of grp 3: RECO_DM01_CD_00_DM01CEL01
NOTE: cache closing disk 1 of grp 3: RECO_DM01_CD_01_DM01CEL01
NOTE: cache closing disk 2 of grp 3: RECO_DM01_CD_02_DM01CEL01
NOTE: cache closing disk 3 of grp 3: RECO_DM01_CD_03_DM01CEL01
NOTE: cache closing disk 4 of grp 3: RECO_DM01_CD_04_DM01CEL01
NOTE: cache closing disk 5 of grp 3: RECO_DM01_CD_05_DM01CEL01
NOTE: cache closing disk 6 of grp 3: RECO_DM01_CD_06_DM01CEL01
NOTE: cache closing disk 7 of grp 3: RECO_DM01_CD_07_DM01CEL01
NOTE: cache closing disk 8 of grp 3: RECO_DM01_CD_08_DM01CEL01
NOTE: cache closing disk 9 of grp 3: RECO_DM01_CD_09_DM01CEL01
NOTE: cache closing disk 10 of grp 3: RECO_DM01_CD_10_DM01CEL01
NOTE: cache closing disk 11 of grp 3: RECO_DM01_CD_11_DM01CEL01
NOTE: PST update grp = 2 completed successfully 
NOTE: PST update grp = 3 completed successfully 
Sun Sep 02 09:08:49 2012
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
NOTE: Voting file relocation is required in diskgroup DBFS_DG
Sun Sep 02 09:08:51 2012
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
NOTE: Voting file relocation is required in diskgroup DBFS_DG
NOTE: Attempting voting file relocation on diskgroup DBFS_DG
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
Sun Sep 02 09:10:06 2012
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
Sun Sep 02 09:10:39 2012
NOTE: successfully read ACD block gn=1 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=1 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=1 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed

*** 2012-09-02 09:08:48.881
Box name 0 - 192.168.64.132
OSS OS Pid - 12253
Reconnect: Attempts: 1 Last TS: 5693385570
Dumping SKGXP connection state: Band 0: port ID - 0x1267e4a8, connection - 0x126d98d0
Dumping SKGXP connection state: Band 1: port ID - 0x12714de8, connection - 0x126aab90
Dumping SKGXP connection state: Band 2: port ID - 0x126a20d8, connection - 0x126865a0
Dumping SKGXP connection state: Band 3: port ID - 0x126662b8, connection - 0x1269c680
Dumping SKGXP connection state: Band 4: port ID - 0x1271b048, connection - 0x126b2130
Dumping SKGXP connection state: Band 5: port ID - 0x12715188, connection - 0x126d2330
Dumping SKGXP connection state: Band 6: port ID - 0x126a26d8, connection - 0x12691610
Dumping SKGXP connection state: Band 7: port ID - 0x126a2728, connection - 0x126a35f0
Dumping SKGXP connection state: Band 8: port ID - 0x126a2778, connection - 0x126b96d0
Reconnecting to box 0x126c70a0 ...
Storage box 0x126c70a0 Inc: 1 with the source id 3379307313
Box name 0 - 192.168.64.133
OSS OS Pid - 12308
Reconnect: Attempts: 1 Last TS: 5693385570
Dumping SKGXP connection state: Band 0: port ID - 0x126a2128, connection - 0x126a70c0
Dumping SKGXP connection state: Band 1: port ID - 0x126a2598, connection - 0x126950e0
Dumping SKGXP connection state: Band 2: port ID - 0x12666308, connection - 0x126e0e70
Dumping SKGXP connection state: Band 3: port ID - 0x1271b0a8, connection - 0x126d5e00
Dumping SKGXP connection state: Band 4: port ID - 0x126666c8, connection - 0x1270ed60
Dumping SKGXP connection state: Band 5: port ID - 0x12714ed8, connection - 0x1268a070
Dumping SKGXP connection state: Band 6: port ID - 0x1271afe8, connection - 0x12682ad0
Dumping SKGXP connection state: Band 7: port ID - 0x126661c8, connection - 0x126c0c70
Dumping SKGXP connection state: Band 8: port ID - 0x12666218, connection - 0x126c72c0
Reissuing requests for the box 0x126c70a0
Reissuing requests for the box 0x126a33d0

*** 2012-09-02 09:08:51.185
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
ORA-15062: ASM disk is globally closed

RDBMS ALERT.loG



Sun Sep 02 05:59:59 2012
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sun Sep 02 05:59:59 2012
Starting background process VKRM
Sun Sep 02 05:59:59 2012
VKRM started with pid=83, OS id=5062 
Sun Sep 02 09:08:48 2012
NOTE: disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 (DATA_DM01) is offline for reads
NOTE: disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 (RECO_DM01) is offline for reads
NOTE: disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 (DATA_DM01) is offline for writes
NOTE: disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
NOTE: disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 (RECO_DM01) is offline for writes
Sun Sep 02 09:08:48 2012
Errors in file /u01/app/oracle/diag/rdbms/dbm/dbm1/trace/dbm1_ckpt_31547.trc:
ORA-27603: Cell storage I/O error, I/O failed on disk o/192.168.64.131/RECO_DM01_CD_03_dm01cel01 at offset 16826368 for data length 16384
ORA-27626: Exadata error: 12 (Network error)
ORA-27300: OS system dependent operation:rpc update timed out failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxp_path
WARNING: Write Failed. group:3 disk:3 AU:4 offset:49152 size:16384





*** 2012-09-02 09:08:48.865
ORA-27626: Exadata error: 12 (Network error)
ORA-27300: OS system dependent operation:rpc update timed out failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxp_path
WARNING: Write Failed. group:3 disk:3 AU:4 offset:49152 size:16384
path:o/192.168.64.131/RECO_DM01_CD_03_dm01cel01
         incarnation:0xeae31f74 asynchronous result:'I/O error'
         subsys:OSS iop:0x2adfeda7d130 bufp:0x2adfed997e00 osderr:0xc osderr1:0x0
         Exadata error:'Network error'
WARNING: (post-reap) disk offline and rejecting I/O
  dsk: 3, au: 4, fn: 256, ext: 0


SQL> select status from v$datafile; 

STATUS
-------
SYSTEM
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE

6 rows selected.

丢失一个cell的情况下数据库所有数据文件仍ONLINE可操作状态

重启CELL,观察恢复状态:



CellCLI> alter cell startup services all

Starting the RS, CELLSRV, and MS services...
Getting the state of RS services... 
 running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was successful.



CellCLI> list griddisk
         DATA_DM01_CD_00_dm01cel01       active
         DATA_DM01_CD_01_dm01cel01       active
         DATA_DM01_CD_02_dm01cel01       active
         DATA_DM01_CD_03_dm01cel01       active
         DATA_DM01_CD_04_dm01cel01       active
         DATA_DM01_CD_05_dm01cel01       active
         DATA_DM01_CD_06_dm01cel01       active
         DATA_DM01_CD_07_dm01cel01       active
         DATA_DM01_CD_08_dm01cel01       active
         DATA_DM01_CD_09_dm01cel01       active
         DATA_DM01_CD_10_dm01cel01       active
         DATA_DM01_CD_11_dm01cel01       active
         DBFS_DG_CD_02_dm01cel01         active
         DBFS_DG_CD_03_dm01cel01         active
         DBFS_DG_CD_04_dm01cel01         active
         DBFS_DG_CD_05_dm01cel01         active
         DBFS_DG_CD_06_dm01cel01         active
         DBFS_DG_CD_07_dm01cel01         active
         DBFS_DG_CD_08_dm01cel01         active
         DBFS_DG_CD_09_dm01cel01         active
         DBFS_DG_CD_10_dm01cel01         active
         DBFS_DG_CD_11_dm01cel01         active
         RECO_DM01_CD_00_dm01cel01       active
         RECO_DM01_CD_01_dm01cel01       active
         RECO_DM01_CD_02_dm01cel01       active
         RECO_DM01_CD_03_dm01cel01       active
         RECO_DM01_CD_04_dm01cel01       active
         RECO_DM01_CD_05_dm01cel01       active
         RECO_DM01_CD_06_dm01cel01       active
         RECO_DM01_CD_07_dm01cel01       active
         RECO_DM01_CD_08_dm01cel01       active
         RECO_DM01_CD_09_dm01cel01       active
         RECO_DM01_CD_10_dm01cel01       active
         RECO_DM01_CD_11_dm01cel01       active



ASM ALERT.LOG

Sun Sep 02 09:15:18 2012
WARNING: Disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 mode 0x1 is now being offlined
WARNING: Disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 mode 0x1 is now being offlined
Sun Sep 02 09:15:19 2012
Starting background process XDWK
Sun Sep 02 09:15:19 2012
XDWK started with pid=35, OS id=26059 
Sun Sep 02 09:15:19 2012
NOTE: disk validation pending for group 1/0x34e3ee5a (DATA_DM01)
NOTE: Found o/192.168.64.131/DATA_DM01_CD_03_dm01cel01 for disk DATA_DM01_CD_03_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_01_dm01cel01 for disk DATA_DM01_CD_01_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_10_dm01cel01 for disk DATA_DM01_CD_10_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_09_dm01cel01 for disk DATA_DM01_CD_09_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_06_dm01cel01 for disk DATA_DM01_CD_06_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_05_dm01cel01 for disk DATA_DM01_CD_05_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_00_dm01cel01 for disk DATA_DM01_CD_00_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_02_dm01cel01 for disk DATA_DM01_CD_02_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_07_dm01cel01 for disk DATA_DM01_CD_07_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_08_dm01cel01 for disk DATA_DM01_CD_08_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_11_dm01cel01 for disk DATA_DM01_CD_11_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DATA_DM01_CD_04_dm01cel01 for disk DATA_DM01_CD_04_DM01CEL01
WARNING: ignoring disk  in deep discovery
SUCCESS: validated disks for 1/0x34e3ee5a (DATA_DM01)
NOTE: membership refresh pending for group 1/0x34e3ee5a (DATA_DM01)
Sun Sep 02 09:15:24 2012
NOTE: successfully read ACD block gn=1 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
GMON querying group 1 at 19 for pid 19, osid 30754
NOTE: cache opening disk 0 of grp 1: DATA_DM01_CD_00_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_00_dm01cel01
NOTE: cache opening disk 1 of grp 1: DATA_DM01_CD_01_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_01_dm01cel01
NOTE: cache opening disk 2 of grp 1: DATA_DM01_CD_02_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_02_dm01cel01
NOTE: cache opening disk 3 of grp 1: DATA_DM01_CD_03_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_03_dm01cel01
NOTE: cache opening disk 4 of grp 1: DATA_DM01_CD_04_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_04_dm01cel01
NOTE: cache opening disk 5 of grp 1: DATA_DM01_CD_05_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_05_dm01cel01
NOTE: cache opening disk 6 of grp 1: DATA_DM01_CD_06_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_06_dm01cel01
NOTE: cache opening disk 7 of grp 1: DATA_DM01_CD_07_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_07_dm01cel01
NOTE: cache opening disk 8 of grp 1: DATA_DM01_CD_08_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_08_dm01cel01
NOTE: cache opening disk 9 of grp 1: DATA_DM01_CD_09_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_09_dm01cel01
NOTE: cache opening disk 10 of grp 1: DATA_DM01_CD_10_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_10_dm01cel01
NOTE: cache opening disk 11 of grp 1: DATA_DM01_CD_11_DM01CEL01 path:o/192.168.64.131/DATA_DM01_CD_11_dm01cel01
SUCCESS: refreshed membership for 1/0x34e3ee5a (DATA_DM01)
WARNING: Disk 0 (DBFS_DG_CD_02_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 1 (DBFS_DG_CD_03_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 2 (DBFS_DG_CD_04_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 3 (DBFS_DG_CD_05_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 4 (DBFS_DG_CD_06_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 5 (DBFS_DG_CD_07_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 6 (DBFS_DG_CD_08_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 7 (DBFS_DG_CD_09_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 8 (DBFS_DG_CD_10_DM01CEL01) in group 2 mode 0x1 is now being offlined
WARNING: Disk 9 (DBFS_DG_CD_11_DM01CEL01) in group 2 mode 0x1 is now being offlined
NOTE: Voting File refresh pending for group 1/0x34e3ee5a (DATA_DM01)
NOTE: Attempting voting file refresh on diskgroup DATA_DM01
NOTE: disk validation pending for group 2/0x34f3ee5b (DBFS_DG)
Sun Sep 02 09:15:31 2012
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
NOTE: Voting file relocation is required in diskgroup DBFS_DG
NOTE: Attempting voting file relocation on diskgroup DBFS_DG
NOTE: Found o/192.168.64.131/DBFS_DG_CD_09_dm01cel01 for disk DBFS_DG_CD_09_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_06_dm01cel01 for disk DBFS_DG_CD_06_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_03_dm01cel01 for disk DBFS_DG_CD_03_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_10_dm01cel01 for disk DBFS_DG_CD_10_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_04_dm01cel01 for disk DBFS_DG_CD_04_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_11_dm01cel01 for disk DBFS_DG_CD_11_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_07_dm01cel01 for disk DBFS_DG_CD_07_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_05_dm01cel01 for disk DBFS_DG_CD_05_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_08_dm01cel01 for disk DBFS_DG_CD_08_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/DBFS_DG_CD_02_dm01cel01 for disk DBFS_DG_CD_02_DM01CEL01
WARNING: ignoring disk  in deep discovery
SUCCESS: validated disks for 2/0x34f3ee5b (DBFS_DG)
NOTE: membership refresh pending for group 2/0x34f3ee5b (DBFS_DG)
NOTE: successfully read ACD block gn=2 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
NOTE: Voting file relocation is required in diskgroup DBFS_DG
NOTE: Attempting voting file relocation on diskgroup DBFS_DG
Sun Sep 02 09:15:35 2012
GMON querying group 2 at 20 for pid 19, osid 30754
NOTE: cache opening disk 0 of grp 2: DBFS_DG_CD_02_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_02_dm01cel01
NOTE: cache opening disk 1 of grp 2: DBFS_DG_CD_03_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_03_dm01cel01
NOTE: cache opening disk 2 of grp 2: DBFS_DG_CD_04_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_04_dm01cel01
NOTE: cache opening disk 3 of grp 2: DBFS_DG_CD_05_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_05_dm01cel01
NOTE: cache opening disk 4 of grp 2: DBFS_DG_CD_06_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_06_dm01cel01
NOTE: cache opening disk 5 of grp 2: DBFS_DG_CD_07_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_07_dm01cel01
NOTE: cache opening disk 6 of grp 2: DBFS_DG_CD_08_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_08_dm01cel01
NOTE: cache opening disk 7 of grp 2: DBFS_DG_CD_09_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_09_dm01cel01
NOTE: cache opening disk 8 of grp 2: DBFS_DG_CD_10_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_10_dm01cel01
NOTE: cache opening disk 9 of grp 2: DBFS_DG_CD_11_DM01CEL01 path:o/192.168.64.131/DBFS_DG_CD_11_dm01cel01
SUCCESS: refreshed membership for 2/0x34f3ee5b (DBFS_DG)
WARNING: Disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 mode 0x1 is now being offlined
WARNING: Disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 mode 0x1 is now being offlined
NOTE: Voting File refresh pending for group 2/0x34f3ee5b (DBFS_DG)
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
NOTE: Voting file relocation is required in diskgroup DBFS_DG
NOTE: Attempting voting file relocation on diskgroup DBFS_DG
NOTE: voting file allocation on grp 2 disk DBFS_DG_CD_02_DM01CEL01
Sun Sep 02 09:16:09 2012
NOTE: disk validation pending for group 3/0x34f3ee5c (RECO_DM01)
NOTE: Found o/192.168.64.131/RECO_DM01_CD_09_dm01cel01 for disk RECO_DM01_CD_09_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_01_dm01cel01 for disk RECO_DM01_CD_01_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_02_dm01cel01 for disk RECO_DM01_CD_02_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_04_dm01cel01 for disk RECO_DM01_CD_04_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_08_dm01cel01 for disk RECO_DM01_CD_08_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_10_dm01cel01 for disk RECO_DM01_CD_10_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_06_dm01cel01 for disk RECO_DM01_CD_06_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_00_dm01cel01 for disk RECO_DM01_CD_00_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_07_dm01cel01 for disk RECO_DM01_CD_07_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_05_dm01cel01 for disk RECO_DM01_CD_05_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_03_dm01cel01 for disk RECO_DM01_CD_03_DM01CEL01
WARNING: ignoring disk  in deep discovery
NOTE: Found o/192.168.64.131/RECO_DM01_CD_11_dm01cel01 for disk RECO_DM01_CD_11_DM01CEL01
WARNING: ignoring disk  in deep discovery
SUCCESS: validated disks for 3/0x34f3ee5c (RECO_DM01)
Sun Sep 02 09:16:12 2012
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group RECO_DM01: updated PST location: disk 0000 (PST copy 2)
NOTE: membership refresh pending for group 3/0x34f3ee5c (RECO_DM01)
WARNING: GMON found an alien heartbeat (grp 3)
Sun Sep 02 09:16:15 2012
NOTE: successfully read ACD block gn=3 blk=11264 via retry read
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_30748.trc:
ORA-15062: ASM disk is globally closed
GMON querying group 3 at 21 for pid 19, osid 30754
NOTE: group RECO_DM01: updated PST location: disk 0012 (PST copy 0)
NOTE: group RECO_DM01: updated PST location: disk 0024 (PST copy 1)
NOTE: group RECO_DM01: updated PST location: disk 0000 (PST copy 2)
NOTE: cache opening disk 0 of grp 3: RECO_DM01_CD_00_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_00_dm01cel01
NOTE: cache opening disk 1 of grp 3: RECO_DM01_CD_01_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_01_dm01cel01
NOTE: cache opening disk 2 of grp 3: RECO_DM01_CD_02_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_02_dm01cel01
NOTE: cache opening disk 3 of grp 3: RECO_DM01_CD_03_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_03_dm01cel01
NOTE: cache opening disk 4 of grp 3: RECO_DM01_CD_04_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_04_dm01cel01
NOTE: cache opening disk 5 of grp 3: RECO_DM01_CD_05_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_05_dm01cel01
NOTE: cache opening disk 6 of grp 3: RECO_DM01_CD_06_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_06_dm01cel01
NOTE: cache opening disk 7 of grp 3: RECO_DM01_CD_07_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_07_dm01cel01
NOTE: cache opening disk 8 of grp 3: RECO_DM01_CD_08_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_08_dm01cel01
NOTE: cache opening disk 9 of grp 3: RECO_DM01_CD_09_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_09_dm01cel01
NOTE: cache opening disk 10 of grp 3: RECO_DM01_CD_10_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_10_dm01cel01
NOTE: cache opening disk 11 of grp 3: RECO_DM01_CD_11_DM01CEL01 path:o/192.168.64.131/RECO_DM01_CD_11_dm01cel01
SUCCESS: refreshed membership for 3/0x34f3ee5c (RECO_DM01)
NOTE: Voting File refresh pending for group 3/0x34f3ee5c (RECO_DM01)
NOTE: Attempting voting file refresh on diskgroup RECO_DM01


RDBMS ALERT


Sun Sep 02 09:15:23 2012
NOTE: Found o/192.168.64.131/DATA_DM01_CD_00_dm01cel01 for disk DATA_DM01_CD_00_DM01CEL01
SUCCESS: disk DATA_DM01_CD_00_DM01CEL01 (0.3940753198) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_01_dm01cel01 for disk DATA_DM01_CD_01_DM01CEL01
SUCCESS: disk DATA_DM01_CD_01_DM01CEL01 (1.3940753193) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_02_dm01cel01 for disk DATA_DM01_CD_02_DM01CEL01
SUCCESS: disk DATA_DM01_CD_02_DM01CEL01 (2.3940753199) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_03_dm01cel01 for disk DATA_DM01_CD_03_DM01CEL01
SUCCESS: disk DATA_DM01_CD_03_DM01CEL01 (3.3940753192) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_04_dm01cel01 for disk DATA_DM01_CD_04_DM01CEL01
SUCCESS: disk DATA_DM01_CD_04_DM01CEL01 (4.3940753203) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_05_dm01cel01 for disk DATA_DM01_CD_05_DM01CEL01
SUCCESS: disk DATA_DM01_CD_05_DM01CEL01 (5.3940753197) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_06_dm01cel01 for disk DATA_DM01_CD_06_DM01CEL01
SUCCESS: disk DATA_DM01_CD_06_DM01CEL01 (6.3940753196) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_07_dm01cel01 for disk DATA_DM01_CD_07_DM01CEL01
SUCCESS: disk DATA_DM01_CD_07_DM01CEL01 (7.3940753200) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_08_dm01cel01 for disk DATA_DM01_CD_08_DM01CEL01
SUCCESS: disk DATA_DM01_CD_08_DM01CEL01 (8.3940753201) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_09_dm01cel01 for disk DATA_DM01_CD_09_DM01CEL01
SUCCESS: disk DATA_DM01_CD_09_DM01CEL01 (9.3940753195) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_10_dm01cel01 for disk DATA_DM01_CD_10_DM01CEL01
SUCCESS: disk DATA_DM01_CD_10_DM01CEL01 (10.3940753194) replaced in diskgroup DATA_DM01
NOTE: Found o/192.168.64.131/DATA_DM01_CD_11_dm01cel01 for disk DATA_DM01_CD_11_DM01CEL01
SUCCESS: disk DATA_DM01_CD_11_DM01CEL01 (11.3940753202) replaced in diskgroup DATA_DM01
NOTE: disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 (DATA_DM01) is online for writes
NOTE: disk 0 (DATA_DM01_CD_00_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 1 (DATA_DM01_CD_01_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 2 (DATA_DM01_CD_02_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 3 (DATA_DM01_CD_03_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 4 (DATA_DM01_CD_04_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 5 (DATA_DM01_CD_05_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 6 (DATA_DM01_CD_06_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 7 (DATA_DM01_CD_07_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 8 (DATA_DM01_CD_08_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 9 (DATA_DM01_CD_09_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 10 (DATA_DM01_CD_10_DM01CEL01) in group 1 (DATA_DM01) is online for reads
NOTE: disk 11 (DATA_DM01_CD_11_DM01CEL01) in group 1 (DATA_DM01) is online for reads
Sun Sep 02 09:16:12 2012
NOTE: Found o/192.168.64.131/RECO_DM01_CD_00_dm01cel01 for disk RECO_DM01_CD_00_DM01CEL01
SUCCESS: disk RECO_DM01_CD_00_DM01CEL01 (0.3940753265) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_01_dm01cel01 for disk RECO_DM01_CD_01_DM01CEL01
SUCCESS: disk RECO_DM01_CD_01_DM01CEL01 (1.3940753259) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_02_dm01cel01 for disk RECO_DM01_CD_02_DM01CEL01
SUCCESS: disk RECO_DM01_CD_02_DM01CEL01 (2.3940753260) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_03_dm01cel01 for disk RECO_DM01_CD_03_DM01CEL01
SUCCESS: disk RECO_DM01_CD_03_DM01CEL01 (3.3940753268) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_04_dm01cel01 for disk RECO_DM01_CD_04_DM01CEL01
SUCCESS: disk RECO_DM01_CD_04_DM01CEL01 (4.3940753261) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_05_dm01cel01 for disk RECO_DM01_CD_05_DM01CEL01
SUCCESS: disk RECO_DM01_CD_05_DM01CEL01 (5.3940753267) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_06_dm01cel01 for disk RECO_DM01_CD_06_DM01CEL01
SUCCESS: disk RECO_DM01_CD_06_DM01CEL01 (6.3940753264) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_07_dm01cel01 for disk RECO_DM01_CD_07_DM01CEL01
SUCCESS: disk RECO_DM01_CD_07_DM01CEL01 (7.3940753266) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_08_dm01cel01 for disk RECO_DM01_CD_08_DM01CEL01
SUCCESS: disk RECO_DM01_CD_08_DM01CEL01 (8.3940753262) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_09_dm01cel01 for disk RECO_DM01_CD_09_DM01CEL01
SUCCESS: disk RECO_DM01_CD_09_DM01CEL01 (9.3940753258) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_10_dm01cel01 for disk RECO_DM01_CD_10_DM01CEL01
SUCCESS: disk RECO_DM01_CD_10_DM01CEL01 (10.3940753263) replaced in diskgroup RECO_DM01
NOTE: Found o/192.168.64.131/RECO_DM01_CD_11_dm01cel01 for disk RECO_DM01_CD_11_DM01CEL01
SUCCESS: disk RECO_DM01_CD_11_DM01CEL01 (11.3940753269) replaced in diskgroup RECO_DM01
NOTE: disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 (RECO_DM01) is online for writes
NOTE: disk 0 (RECO_DM01_CD_00_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 1 (RECO_DM01_CD_01_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 2 (RECO_DM01_CD_02_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 3 (RECO_DM01_CD_03_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 4 (RECO_DM01_CD_04_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 5 (RECO_DM01_CD_05_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 6 (RECO_DM01_CD_06_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 7 (RECO_DM01_CD_07_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 8 (RECO_DM01_CD_08_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 9 (RECO_DM01_CD_09_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 10 (RECO_DM01_CD_10_DM01CEL01) in group 3 (RECO_DM01) is online for reads
NOTE: disk 11 (RECO_DM01_CD_11_DM01CEL01) in group 3 (RECO_DM01) is online for reads


[root@dm01cel01 trace]# pwd
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/log/diag/asm/cell/dm01cel01/trace

CELL的ALERT日志如下:

Sun Sep 02 09:07:00 2012
[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsmmt (pid: 10473) received clean shutdown signal from pid: 9334, uid: 0
Sun Sep 02 09:07:04 2012
[RS] Stopped Service MS with pid 10474
Sun Sep 02 09:07:04 2012
[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsomt (pid: 12307) received clean shutdown signal from pid: 9334, uid: 0
Sun Sep 02 09:07:04 2012
Clean shutdown signal delivered to OSS<12308>
[RS] Stopped Service CELLSRV with pid 12308
Sun Sep 02 09:07:07 2012
[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsbmt (pid: 9340) received clean shutdown signal from pid: 9334, uid: 0
[RS] Stopped Service RS_BACKUP
[RS] Stopped Service RS_MAIN
Sun Sep 02 09:07:08 2012


[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsbkm (pid: 9342) received clean shutdown signal from pid: 9334, uid: 0
Sun Sep 02 09:07:09 2012
[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrssmt (pid: 9346) received clean shutdown signal from pid: 9334, uid: 0
Sun Sep 02 09:07:10 2012
[RS] Process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrssrm (pid: 9334) received clean shutdown signal from pid: 9334, uid: 0
Sun Sep 02 09:13:40 2012
RS version=11.2.3.1.1,label=OSS_11.2.3.1.1_LINUX.X64_120607,Fri_Jun__8_12:49:44_PDT_2012
[RS] Started Service RS_MAIN with pid 18806
Sun Sep 02 09:13:40 2012
[RS] Started monitoring process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsbmt with pid 18812
Sun Sep 02 09:13:40 2012
[RS] Started monitoring process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsmmt with pid 18813
Sun Sep 02 09:13:40 2012
[RS] Started monitoring process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsomt with pid 18815
Sun Sep 02 09:13:40 2012
RSBK version=11.2.3.1.1,label=OSS_11.2.3.1.1_LINUX.X64_120607,Fri_Jun__8_12:49:44_PDT_2012
[RS] Started Service RS_BACKUP with pid 18814
Sun Sep 02 09:13:40 2012
[RS] Started monitoring process /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrssmt with pid 18844
Sun Sep 02 09:13:40 2012
Successfully setting event parameter -
Sun Sep 02 09:13:40 2012
Successfully setting event parameter -
CELLSRV process id=18817
CELLSRV cell host name=dm01cel01.acs.oracle.com
CELLSRV version=11.2.3.1.1,label=OSS_11.2.3.1.1_LINUX.X64_120607,Fri_Jun__8_12:49:44_PDT_2012
OS Hugepage status:
   Total/free hugepages available=4001/155; hugepage size=2048KB
MS_ALERT HUGEPAGE CLEAR
Cache Allocation: Num 1MB hugepage buffers: 8000 Num 1MB non-hugepage buffers: 0
Cache Allocation: BufferSize: 512. Num buffers: 5000. Start Address: 2AACA2E00000
Cache Allocation: BufferSize: 2048. Num buffers: 5000. Start Address: 2AACA3072000
Cache Allocation: BufferSize: 4096. Num buffers: 5000. Start Address: 2AACA3A37000
Cache Allocation: BufferSize: 8192. Num buffers: 10000. Start Address: 2AACA4DC0000
Cache Allocation: BufferSize: 16384. Num buffers: 5000. Start Address: 2AACA9BE1000
Cache Allocation: BufferSize: 32768. Num buffers: 5000. Start Address: 2AACAEA02000
Cache Allocation: BufferSize: 65536. Num buffers: 5000. Start Address: 2AACB8643000
Cache Allocation: BufferSize: 10485760. Num buffers: 23. Start Address: 2AACCBEC4000
CELL communication is configured to use 1 interface(s):
    192.168.64.131
IPC version: Oracle RDS/IP (generic)
IPC Vendor 1 Protocol 3
  Version 4.1
CellDisk v0.6 name=CD_00_dm01cel01 status=NORMAL guid=d8227f04-4b2a-4a2b-b320-a73fe3211671 found on dev=/dev/sda3
  GridDisk name=RECO_DM01_CD_00_dm01cel01 guid=27bc5e71-1662-419f-84ac-ac7936d051f9 (3428420588)
  GridDisk name=DATA_DM01_CD_00_dm01cel01 guid=1215af58-90d9-4a74-a51b-3ddc6d54aede (3447961868)
CellDisk v0.6 name=FD_05_dm01cel01 status=NORMAL guid=a3718f03-8112-4643-932e-0837ea9d445f found on dev=/dev/sdaa
CellDisk v0.6 name=FD_06_dm01cel01 status=NORMAL guid=309f226c-8255-4c71-9991-ad7eab8e934b found on dev=/dev/sdab
CellDisk v0.6 name=FD_07_dm01cel01 status=NORMAL guid=af4b7aa9-486a-4458-a4e7-0ecfb17536ab found on dev=/dev/sdac
CellDisk v0.6 name=FD_13_dm01cel01 status=NORMAL guid=7bad74e5-d125-4702-882e-a11db0154588 found on dev=/dev/sdw
CellDisk v0.6 name=FD_14_dm01cel01 status=NORMAL guid=7588df10-a2e2-497d-967f-0e1ef5d07d74 found on dev=/dev/sdx
CellDisk v0.6 name=FD_15_dm01cel01 status=NORMAL guid=d70c7598-82e8-496e-ae8a-8f7c888374b8 found on dev=/dev/sdy
CellDisk v0.6 name=FD_04_dm01cel01 status=NORMAL guid=95662b6a-2eeb-40e4-81ca-d880b364fbac found on dev=/dev/sdz
FlashCache: allowing client IOs
Smart Flash Logging disabled due to lack of active Flash Log Stores
Cellsrv Incarnation is set: 2

CELLSRV Server startup complete
Sun Sep  2 09:13:43 2012
[RS] Started Service CELLSRV with pid 18817
[RS] Started Service MS with pid 18816
Sun Sep 02 09:13:47 2012
Heartbeat with diskmon started on dm01db02.acs.oracle.com
Heartbeat with diskmon started on dm01db01.acs.oracle.com
Sun Sep 02 09:13:47 2012
I/O Resource Manager enabled
Sun Sep 02 09:13:48 2012
Caching enabled on FlashCache dm01cel01_FLASHCACHE (1818411676), size=22GB, cdisk=FD_13_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (3896713996), size=22GB, cdisk=FD_05_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (4258309268), size=22GB, cdisk=FD_11_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (2295236356), size=22GB, cdisk=FD_09_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (2066597948), size=22GB, cdisk=FD_06_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (3538503772), size=22GB, cdisk=FD_15_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (1832946924), size=22GB, cdisk=FD_10_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (542355212), size=22GB, cdisk=FD_00_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (1736192156), size=22GB, cdisk=FD_04_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (2921083820), size=22GB, cdisk=FD_14_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (3957806660), size=22GB, cdisk=FD_12_dm01cel01
Caching enabled on FlashCache dm01cel01_FLASHCACHE (767858284), size=22GB, cdisk=FD_01_dm01cel01
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_00_dm01cel01 to group#1
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_01_dm01cel01 to group#1
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_02_dm01cel01 to group#1
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_03_dm01cel01 to group#1
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_04_dm01cel01 to group#2
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_05_dm01cel01 to group#2
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_06_dm01cel01 to group#2
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_07_dm01cel01 to group#2
Sun Sep 02 09:14:00 2012
Info: Assigning flash CD FD_08_dm01cel01 to group#4
Sun Sep 02 09:14:00 2012

Exadata Cellserv 如何列出存储告警信息

Exadata Cellserv 如何列出存储告警信息,可以通过cellcli命令行执行以下命令获得:

LIST ALERTDEFINITION命令用以显示cell服务器上可能生成的各种告警alert的定义。下面的示例列出了告警名字、度量名和描述。
度量名定义了告警所基于的度量。ADRALERT和HardwareAlert则不基于度量,因此没有度量名字

LIST ALERTHISTORY 命令用来显示一个cell服务器上的历史告警信息。在例子中仅列出所有严重性为critical的alert,且过滤条件为没有被管理员所查阅过(examinedBy)的。

create threshold命令用来定义一个阀值,指定条件生成一个度量告警。


CellCLI> LIST ALERTDEFINITION ATTRIBUTES name,metricname,description
         ADRAlert                                                                        "Incident Alert"
         HardwareAlert                                                                   "Hardware Alert"
         StatefulAlert_CD_IO_BY_R_LG                     CD_IO_BY_R_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_BY_R_LG_SEC                 CD_IO_BY_R_LG_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_BY_R_SM                     CD_IO_BY_R_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_BY_R_SM_SEC                 CD_IO_BY_R_SM_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_BY_W_LG                     CD_IO_BY_W_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_BY_W_LG_SEC                 CD_IO_BY_W_LG_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_BY_W_SM                     CD_IO_BY_W_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_BY_W_SM_SEC                 CD_IO_BY_W_SM_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_ERRS                        CD_IO_ERRS                      "Threshold Alert"
         StatefulAlert_CD_IO_ERRS_MIN                    CD_IO_ERRS_MIN                  "Threshold Alert"
         StatefulAlert_CD_IO_LOAD                        CD_IO_LOAD                      "Threshold Alert"
         StatefulAlert_CD_IO_RQ_R_LG                     CD_IO_RQ_R_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_RQ_R_LG_SEC                 CD_IO_RQ_R_LG_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_RQ_R_SM                     CD_IO_RQ_R_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_RQ_R_SM_SEC                 CD_IO_RQ_R_SM_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_RQ_W_LG                     CD_IO_RQ_W_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_RQ_W_LG_SEC                 CD_IO_RQ_W_LG_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_RQ_W_SM                     CD_IO_RQ_W_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_RQ_W_SM_SEC                 CD_IO_RQ_W_SM_SEC               "Threshold Alert"
         StatefulAlert_CD_IO_ST_RQ                       CD_IO_ST_RQ                     "Threshold Alert"
         StatefulAlert_CD_IO_TM_R_LG                     CD_IO_TM_R_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_TM_R_LG_RQ                  CD_IO_TM_R_LG_RQ                "Threshold Alert"
         StatefulAlert_CD_IO_TM_R_SM                     CD_IO_TM_R_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_TM_R_SM_RQ                  CD_IO_TM_R_SM_RQ                "Threshold Alert"
         StatefulAlert_CD_IO_TM_W_LG                     CD_IO_TM_W_LG                   "Threshold Alert"
         StatefulAlert_CD_IO_TM_W_LG_RQ                  CD_IO_TM_W_LG_RQ                "Threshold Alert"
         StatefulAlert_CD_IO_TM_W_SM                     CD_IO_TM_W_SM                   "Threshold Alert"
         StatefulAlert_CD_IO_TM_W_SM_RQ                  CD_IO_TM_W_SM_RQ                "Threshold Alert"
         StatefulAlert_CG_FC_IO_BY_SEC                   CG_FC_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_CG_FC_IO_RQ                       CG_FC_IO_RQ                     "Threshold Alert"
         StatefulAlert_CG_FC_IO_RQ_SEC                   CG_FC_IO_RQ_SEC                 "Threshold Alert"
         StatefulAlert_CG_FD_IO_BY_SEC                   CG_FD_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_CG_FD_IO_LOAD                     CG_FD_IO_LOAD                   "Threshold Alert"
         StatefulAlert_CG_FD_IO_RQ_LG                    CG_FD_IO_RQ_LG                  "Threshold Alert"
         StatefulAlert_CG_FD_IO_RQ_LG_SEC                CG_FD_IO_RQ_LG_SEC              "Threshold Alert"
         StatefulAlert_CG_FD_IO_RQ_SM                    CG_FD_IO_RQ_SM                  "Threshold Alert"
         StatefulAlert_CG_FD_IO_RQ_SM_SEC                CG_FD_IO_RQ_SM_SEC              "Threshold Alert"
         StatefulAlert_CG_IO_BY_SEC                      CG_IO_BY_SEC                    "Threshold Alert"
         StatefulAlert_CG_IO_LOAD                        CG_IO_LOAD                      "Threshold Alert"
         StatefulAlert_CG_IO_RQ_LG                       CG_IO_RQ_LG                     "Threshold Alert"
         StatefulAlert_CG_IO_RQ_LG_SEC                   CG_IO_RQ_LG_SEC                 "Threshold Alert"
         StatefulAlert_CG_IO_RQ_SM                       CG_IO_RQ_SM                     "Threshold Alert"
         StatefulAlert_CG_IO_RQ_SM_SEC                   CG_IO_RQ_SM_SEC                 "Threshold Alert"
         StatefulAlert_CG_IO_UTIL_LG                     CG_IO_UTIL_LG                   "Threshold Alert"
         StatefulAlert_CG_IO_UTIL_SM                     CG_IO_UTIL_SM                   "Threshold Alert"
         StatefulAlert_CG_IO_WT_LG                       CG_IO_WT_LG                     "Threshold Alert"
         StatefulAlert_CG_IO_WT_LG_RQ                    CG_IO_WT_LG_RQ                  "Threshold Alert"
         StatefulAlert_CG_IO_WT_SM                       CG_IO_WT_SM                     "Threshold Alert"
         StatefulAlert_CG_IO_WT_SM_RQ                    CG_IO_WT_SM_RQ                  "Threshold Alert"
         StatefulAlert_CL_BBU_CHARGE                     CL_BBU_CHARGE                   "Threshold Alert"
         StatefulAlert_CL_BBU_TEMP                       CL_BBU_TEMP                     "Threshold Alert"
         StatefulAlert_CL_CPUT                           CL_CPUT                         "Threshold Alert"
         StatefulAlert_CL_CPUT_CS                        CL_CPUT_CS                      "Threshold Alert"
         StatefulAlert_CL_CPUT_MS                        CL_CPUT_MS                      "Threshold Alert"
         StatefulAlert_CL_FANS                           CL_FANS                         "Threshold Alert"
         StatefulAlert_CL_FSUT                           CL_FSUT                         "Threshold Alert"
         StatefulAlert_CL_MEMUT                          CL_MEMUT                        "Threshold Alert"
         StatefulAlert_CL_MEMUT_CS                       CL_MEMUT_CS                     "Threshold Alert"
         StatefulAlert_CL_MEMUT_MS                       CL_MEMUT_MS                     "Threshold Alert"
         StatefulAlert_CL_RUNQ                           CL_RUNQ                         "Threshold Alert"
         StatefulAlert_CL_SWAP_IN_BY_SEC                 CL_SWAP_IN_BY_SEC               "Threshold Alert"
         StatefulAlert_CL_SWAP_OUT_BY_SEC                CL_SWAP_OUT_BY_SEC              "Threshold Alert"
         StatefulAlert_CL_SWAP_USAGE                     CL_SWAP_USAGE                   "Threshold Alert"
         StatefulAlert_CL_TEMP                           CL_TEMP                         "Threshold Alert"
         StatefulAlert_CL_VIRTMEM_CS                     CL_VIRTMEM_CS                   "Threshold Alert"
         StatefulAlert_CL_VIRTMEM_MS                     CL_VIRTMEM_MS                   "Threshold Alert"
         StatefulAlert_CT_FC_IO_BY_SEC                   CT_FC_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_CT_FC_IO_RQ                       CT_FC_IO_RQ                     "Threshold Alert"
         StatefulAlert_CT_FC_IO_RQ_SEC                   CT_FC_IO_RQ_SEC                 "Threshold Alert"
         StatefulAlert_CT_FD_IO_BY_SEC                   CT_FD_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_CT_FD_IO_LOAD                     CT_FD_IO_LOAD                   "Threshold Alert"
         StatefulAlert_CT_FD_IO_RQ_LG                    CT_FD_IO_RQ_LG                  "Threshold Alert"
         StatefulAlert_CT_FD_IO_RQ_LG_SEC                CT_FD_IO_RQ_LG_SEC              "Threshold Alert"
         StatefulAlert_CT_FD_IO_RQ_SM                    CT_FD_IO_RQ_SM                  "Threshold Alert"
         StatefulAlert_CT_FD_IO_RQ_SM_SEC                CT_FD_IO_RQ_SM_SEC              "Threshold Alert"
         StatefulAlert_CT_IO_BY_SEC                      CT_IO_BY_SEC                    "Threshold Alert"
         StatefulAlert_CT_IO_LOAD                        CT_IO_LOAD                      "Threshold Alert"
         StatefulAlert_CT_IO_RQ_LG                       CT_IO_RQ_LG                     "Threshold Alert"
         StatefulAlert_CT_IO_RQ_LG_SEC                   CT_IO_RQ_LG_SEC                 "Threshold Alert"
         StatefulAlert_CT_IO_RQ_SM                       CT_IO_RQ_SM                     "Threshold Alert"
         StatefulAlert_CT_IO_RQ_SM_SEC                   CT_IO_RQ_SM_SEC                 "Threshold Alert"
         StatefulAlert_CT_IO_UTIL_LG                     CT_IO_UTIL_LG                   "Threshold Alert"
         StatefulAlert_CT_IO_UTIL_SM                     CT_IO_UTIL_SM                   "Threshold Alert"
         StatefulAlert_CT_IO_WT_LG                       CT_IO_WT_LG                     "Threshold Alert"
         StatefulAlert_CT_IO_WT_LG_RQ                    CT_IO_WT_LG_RQ                  "Threshold Alert"
         StatefulAlert_CT_IO_WT_SM                       CT_IO_WT_SM                     "Threshold Alert"
         StatefulAlert_CT_IO_WT_SM_RQ                    CT_IO_WT_SM_RQ                  "Threshold Alert"
         StatefulAlert_DB_FC_IO_BY_SEC                   DB_FC_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_DB_FC_IO_RQ                       DB_FC_IO_RQ                     "Threshold Alert"
         StatefulAlert_DB_FC_IO_RQ_SEC                   DB_FC_IO_RQ_SEC                 "Threshold Alert"
         StatefulAlert_DB_FD_IO_BY_SEC                   DB_FD_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_DB_FD_IO_LOAD                     DB_FD_IO_LOAD                   "Threshold Alert"
         StatefulAlert_DB_FD_IO_RQ_LG                    DB_FD_IO_RQ_LG                  "Threshold Alert"
         StatefulAlert_DB_FD_IO_RQ_LG_SEC                DB_FD_IO_RQ_LG_SEC              "Threshold Alert"
         StatefulAlert_DB_FD_IO_RQ_SM                    DB_FD_IO_RQ_SM                  "Threshold Alert"
         StatefulAlert_DB_FD_IO_RQ_SM_SEC                DB_FD_IO_RQ_SM_SEC              "Threshold Alert"
         StatefulAlert_DB_FL_IO_BY                       DB_FL_IO_BY                     "Threshold Alert"
         StatefulAlert_DB_FL_IO_BY_SEC                   DB_FL_IO_BY_SEC                 "Threshold Alert"
         StatefulAlert_DB_FL_IO_RQ                       DB_FL_IO_RQ                     "Threshold Alert"
         StatefulAlert_DB_FL_IO_RQ_SEC                   DB_FL_IO_RQ_SEC                 "Threshold Alert"
         StatefulAlert_DB_IO_BY_SEC                      DB_IO_BY_SEC                    "Threshold Alert"
         StatefulAlert_DB_IO_LOAD                        DB_IO_LOAD                      "Threshold Alert"
         StatefulAlert_DB_IO_RQ_LG                       DB_IO_RQ_LG                     "Threshold Alert"
         StatefulAlert_DB_IO_RQ_LG_SEC                   DB_IO_RQ_LG_SEC                 "Threshold Alert"
         StatefulAlert_DB_IO_RQ_SM                       DB_IO_RQ_SM                     "Threshold Alert"
         StatefulAlert_DB_IO_RQ_SM_SEC                   DB_IO_RQ_SM_SEC                 "Threshold Alert"
         StatefulAlert_DB_IO_UTIL_LG                     DB_IO_UTIL_LG                   "Threshold Alert"
         StatefulAlert_DB_IO_UTIL_SM                     DB_IO_UTIL_SM                   "Threshold Alert"
         StatefulAlert_DB_IO_WT_LG                       DB_IO_WT_LG                     "Threshold Alert"
         StatefulAlert_DB_IO_WT_LG_RQ                    DB_IO_WT_LG_RQ                  "Threshold Alert"
         StatefulAlert_DB_IO_WT_SM                       DB_IO_WT_SM                     "Threshold Alert"
         StatefulAlert_DB_IO_WT_SM_RQ                    DB_IO_WT_SM_RQ                  "Threshold Alert"
         StatefulAlert_FC_BYKEEP_OVERWR                  FC_BYKEEP_OVERWR                "Threshold Alert"
         StatefulAlert_FC_BYKEEP_OVERWR_SEC              FC_BYKEEP_OVERWR_SEC            "Threshold Alert"
         StatefulAlert_FC_BYKEEP_USED                    FC_BYKEEP_USED                  "Threshold Alert"
         StatefulAlert_FC_BY_USED                        FC_BY_USED                      "Threshold Alert"
         StatefulAlert_FC_IO_BYKEEP_R                    FC_IO_BYKEEP_R                  "Threshold Alert"
         StatefulAlert_FC_IO_BYKEEP_R_SEC                FC_IO_BYKEEP_R_SEC              "Threshold Alert"
         StatefulAlert_FC_IO_BYKEEP_W                    FC_IO_BYKEEP_W                  "Threshold Alert"
         StatefulAlert_FC_IO_BYKEEP_W_SEC                FC_IO_BYKEEP_W_SEC              "Threshold Alert"
         StatefulAlert_FC_IO_BY_R                        FC_IO_BY_R                      "Threshold Alert"
         StatefulAlert_FC_IO_BY_R_MISS                   FC_IO_BY_R_MISS                 "Threshold Alert"
         StatefulAlert_FC_IO_BY_R_MISS_SEC               FC_IO_BY_R_MISS_SEC             "Threshold Alert"
         StatefulAlert_FC_IO_BY_R_SEC                    FC_IO_BY_R_SEC                  "Threshold Alert"
         StatefulAlert_FC_IO_BY_R_SKIP                   FC_IO_BY_R_SKIP                 "Threshold Alert"
         StatefulAlert_FC_IO_BY_R_SKIP_SEC               FC_IO_BY_R_SKIP_SEC             "Threshold Alert"
         StatefulAlert_FC_IO_BY_W                        FC_IO_BY_W                      "Threshold Alert"
         StatefulAlert_FC_IO_BY_W_SEC                    FC_IO_BY_W_SEC                  "Threshold Alert"
         StatefulAlert_FC_IO_ERRS                        FC_IO_ERRS                      "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R                    FC_IO_RQKEEP_R                  "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R_MISS               FC_IO_RQKEEP_R_MISS             "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R_MISS_SEC           FC_IO_RQKEEP_R_MISS_SEC         "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R_SEC                FC_IO_RQKEEP_R_SEC              "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R_SKIP               FC_IO_RQKEEP_R_SKIP             "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_R_SKIP_SEC           FC_IO_RQKEEP_R_SKIP_SEC         "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_W                    FC_IO_RQKEEP_W                  "Threshold Alert"
         StatefulAlert_FC_IO_RQKEEP_W_SEC                FC_IO_RQKEEP_W_SEC              "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R                        FC_IO_RQ_R                      "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R_MISS                   FC_IO_RQ_R_MISS                 "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R_MISS_SEC               FC_IO_RQ_R_MISS_SEC             "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R_SEC                    FC_IO_RQ_R_SEC                  "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R_SKIP                   FC_IO_RQ_R_SKIP                 "Threshold Alert"
         StatefulAlert_FC_IO_RQ_R_SKIP_SEC               FC_IO_RQ_R_SKIP_SEC             "Threshold Alert"
         StatefulAlert_FC_IO_RQ_W                        FC_IO_RQ_W                      "Threshold Alert"
         StatefulAlert_FC_IO_RQ_W_SEC                    FC_IO_RQ_W_SEC                  "Threshold Alert"
         StatefulAlert_FL_ACTUAL_OUTLIERS                FL_ACTUAL_OUTLIERS              "Threshold Alert"
         StatefulAlert_FL_BY_KEEP                        FL_BY_KEEP                      "Threshold Alert"
         StatefulAlert_FL_DISK_FIRST                     FL_DISK_FIRST                   "Threshold Alert"
         StatefulAlert_FL_DISK_IO_ERRS                   FL_DISK_IO_ERRS                 "Threshold Alert"
         StatefulAlert_FL_EFFICIENCY_PERCENTAGE          FL_EFFICIENCY_PERCENTAGE        "Threshold Alert"
         StatefulAlert_FL_EFFICIENCY_PERCENTAGE_HOUR     FL_EFFICIENCY_PERCENTAGE_HOUR   "Threshold Alert"
         StatefulAlert_FL_FLASH_FIRST                    FL_FLASH_FIRST                  "Threshold Alert"
         StatefulAlert_FL_FLASH_IO_ERRS                  FL_FLASH_IO_ERRS                "Threshold Alert"
         StatefulAlert_FL_FLASH_ONLY_OUTLIERS            FL_FLASH_ONLY_OUTLIERS          "Threshold Alert"
         StatefulAlert_FL_IO_DB_BY_W                     FL_IO_DB_BY_W                   "Threshold Alert"
         StatefulAlert_FL_IO_DB_BY_W_SEC                 FL_IO_DB_BY_W_SEC               "Threshold Alert"
         StatefulAlert_FL_IO_FL_BY_W                     FL_IO_FL_BY_W                   "Threshold Alert"
         StatefulAlert_FL_IO_FL_BY_W_SEC                 FL_IO_FL_BY_W_SEC               "Threshold Alert"
         StatefulAlert_FL_IO_W                           FL_IO_W                         "Threshold Alert"
         StatefulAlert_FL_IO_W_SKIP_BUSY                 FL_IO_W_SKIP_BUSY               "Threshold Alert"
         StatefulAlert_FL_IO_W_SKIP_BUSY_MIN             FL_IO_W_SKIP_BUSY_MIN           "Threshold Alert"
         StatefulAlert_FL_IO_W_SKIP_LARGE                FL_IO_W_SKIP_LARGE              "Threshold Alert"
         StatefulAlert_FL_PREVENTED_OUTLIERS             FL_PREVENTED_OUTLIERS           "Threshold Alert"
         StatefulAlert_GD_IO_BY_R_LG                     GD_IO_BY_R_LG                   "Threshold Alert"
         StatefulAlert_GD_IO_BY_R_LG_SEC                 GD_IO_BY_R_LG_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_BY_R_SM                     GD_IO_BY_R_SM                   "Threshold Alert"
         StatefulAlert_GD_IO_BY_R_SM_SEC                 GD_IO_BY_R_SM_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_BY_W_LG                     GD_IO_BY_W_LG                   "Threshold Alert"
         StatefulAlert_GD_IO_BY_W_LG_SEC                 GD_IO_BY_W_LG_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_BY_W_SM                     GD_IO_BY_W_SM                   "Threshold Alert"
         StatefulAlert_GD_IO_BY_W_SM_SEC                 GD_IO_BY_W_SM_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_ERRS                        GD_IO_ERRS                      "Threshold Alert"
         StatefulAlert_GD_IO_ERRS_MIN                    GD_IO_ERRS_MIN                  "Threshold Alert"
         StatefulAlert_GD_IO_RQ_R_LG                     GD_IO_RQ_R_LG                   "Threshold Alert"
         StatefulAlert_GD_IO_RQ_R_LG_SEC                 GD_IO_RQ_R_LG_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_RQ_R_SM                     GD_IO_RQ_R_SM                   "Threshold Alert"
         StatefulAlert_GD_IO_RQ_R_SM_SEC                 GD_IO_RQ_R_SM_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_RQ_W_LG                     GD_IO_RQ_W_LG                   "Threshold Alert"
         StatefulAlert_GD_IO_RQ_W_LG_SEC                 GD_IO_RQ_W_LG_SEC               "Threshold Alert"
         StatefulAlert_GD_IO_RQ_W_SM                     GD_IO_RQ_W_SM                   "Threshold Alert"
         StatefulAlert_GD_IO_RQ_W_SM_SEC                 GD_IO_RQ_W_SM_SEC               "Threshold Alert"
         StatefulAlert_IORM_MODE                         IORM_MODE                       "Threshold Alert"
         StatefulAlert_N_HCA_MB_RCV_SEC                  N_HCA_MB_RCV_SEC                "Threshold Alert"
         StatefulAlert_N_HCA_MB_TRANS_SEC                N_HCA_MB_TRANS_SEC              "Threshold Alert"
         StatefulAlert_N_MB_DROP                         N_MB_DROP                       "Threshold Alert"
         StatefulAlert_N_MB_DROP_SEC                     N_MB_DROP_SEC                   "Threshold Alert"
         StatefulAlert_N_MB_RDMA_DROP                    N_MB_RDMA_DROP                  "Threshold Alert"
         StatefulAlert_N_MB_RDMA_DROP_SEC                N_MB_RDMA_DROP_SEC              "Threshold Alert"
         StatefulAlert_N_MB_RECEIVED                     N_MB_RECEIVED                   "Threshold Alert"
         StatefulAlert_N_MB_RECEIVED_SEC                 N_MB_RECEIVED_SEC               "Threshold Alert"
         StatefulAlert_N_MB_RESENT                       N_MB_RESENT                     "Threshold Alert"
         StatefulAlert_N_MB_RESENT_SEC                   N_MB_RESENT_SEC                 "Threshold Alert"
         StatefulAlert_N_MB_SENT                         N_MB_SENT                       "Threshold Alert"
         StatefulAlert_N_MB_SENT_SEC                     N_MB_SENT_SEC                   "Threshold Alert"
         StatefulAlert_N_NIC_KB_RCV_SEC                  N_NIC_KB_RCV_SEC                "Threshold Alert"
         StatefulAlert_N_NIC_KB_TRANS_SEC                N_NIC_KB_TRANS_SEC              "Threshold Alert"
         StatefulAlert_N_NIC_NW                          N_NIC_NW                        "Threshold Alert"
         StatefulAlert_N_RDMA_RETRY_TM                   N_RDMA_RETRY_TM                 "Threshold Alert"
         Stateful_HardwareAlert                                                          "Hardware Stateful Alert"
         Stateful_SoftwareAlert                                                          "Software Stateful Alert"


CellCLI> list alerthistory where  severity='critical' and examinedBy='' detail
         name:                   1_1
         alertMessage:           "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Error. Exadata configuration file not found /opt/oracle.cellos/cell.conf [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
         alertSequenceID:        1
         alertShortName:         Software
         alertType:              Stateful
         beginTime:              2011-08-09T15:16:39-04:00
         endTime:                2011-08-09T15:37:04-04:00
         examinedBy:             
         metricObjectName:       checkconfig
         notificationState:      0
         sequenceBeginTime:      2011-08-09T15:16:39-04:00
         severity:               critical
         alertAction:            "Correct the configuration problems. Then run cellcli command:   ALTER CELL VALIDATE CONFIGURATION   Verify that the new configuration is correct."

         name:                   2
         alertMessage:           "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
         alertSequenceID:        2
         alertShortName:         ADR
         alertType:              Stateless
         beginTime:              2011-08-09T15:32:47-04:00
         endTime:                
         examinedBy:             
         notificationState:      0
         sequenceBeginTime:      2011-08-09T15:32:47-04:00
         severity:               critical
         alertAction:            "Errors in file /opt/oracle/cell11.2.2.4.2_LINUX.X64_111221/log/diag/asm/cell/dm01cel01/trace/rstrc_11798_4.trc  (incident=1).   Please create an incident package for incident 1 using ADRCI and upload the incident package to Oracle Support.  This can be done as shown below.  From a shell session on cell localhost, enter the following commands:   $ cd /opt/oracle/cell11.2.2.4.2_LINUX.X64_111221/log  $ adrci  adrci> set home diag/asm/cell/dm01cel01  adrci> ips pack incident 1 in /tmp   <<>>  Add this zip file as an attachment to an email message and send the message to Oracle Support."
		 
CellCLI> CREATE THRESHOLD db_io_rq_sm_sec.db123 comparison='>', critical=120
Threshold db_io_rq_sm_sec.db123 successfully created
		 
		 

如何重置Exadata cell上的flash cache的内容

如何重置Exadata cell上的flash cache的内容?

可以通过以下命令实现:


cellcli
CellCLI: Release 11.2.3.1.1 - Production on Sun Sep 02 07:29:08 EDT 2012

Copyright (c) 2007, 2011, Oracle.  All rights reserved.
Cell Efficiency Ratio: 527

CellCLI> LIST FLASHCACHECONTENT where objectnumber=17425 detail
         cachedKeepSize:         8755838976
         cachedSize:             8757706752
         dbID:                   2080757153
         dbUniqueName:           DBM
         hitCount:               12940
         hoursToExpiration:      21
         missCount:              78488
         objectNumber:           17425
         tableSpaceNumber:       7

仅需要设置immediate cellsrv.cellsrv_flashcache(Reset,0,0,0) event,event的设置语法与普通的 RDBMS十分相似

CellCLI> alter cell events = "immediate cellsrv.cellsrv_flashcache(Reset,0,0,0)"
Cell dm01cel01 successfully altered

CellCLI> LIST FLASHCACHECONTENT where objectnumber=17425 detail


其他一些有用的命令:

将flashcache的状态转储到trace file中:

CellCLI> alter cell events = “immediate cellsrv.cellsrv_flashcache(dumpStats,0,0,12345)”

重置状态信息

CellCLI> alter cell events = “immediate cellsrv.cellsrv_flashcache(resetStats,0,0,0)”

Exadata测试CELL_FLASH_CACHE KEEP性能

Exadata测试CELL_FLASH_CACHE KEEP  SMART Flash Cache性能

 

 

imageinfo

Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
Cell version: OSS_11.2.3.1.1_LINUX.X64_120607
Cell rpm version: cell-11.2.3.1.1_LINUX.X64_120607-1

Active image version: 11.2.3.1.1.120607
Active image activated: 2012-08-13 18:00:09 -0400
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.1.1.120607

Inactive image version: 11.2.2.4.2.111221
Inactive image activated: 2012-08-09 15:36:25 -0400
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

Boot area has rollback archive for the version: 11.2.2.4.2.111221
Rollback to the inactive partitions: Possible

CellCLI> list flashcache detail 
         name:                   dm01cel01_FLASHCACHE
         cellDisk:               FD_15_dm01cel01,FD_11_dm01cel01,FD_09_dm01cel01,FD_14_dm01cel01,FD_00_dm01cel01,FD_12_dm01cel01,FD_03_dm01cel01,FD_01_dm01cel01,FD_13_dm01cel01,FD_07_dm01cel01,FD_04_dm01cel01,FD_08_dm01cel01,FD_05_dm01cel01,FD_10_dm01cel01,FD_02_dm01cel01,FD_06_dm01cel01
         creationTime:           2012-08-13T17:58:02-04:00
         degradedCelldisks:      
         effectiveCacheSize:     365.25G
         id:                     f7118853-fd8d-4df4-917e-738c093530a7
         size:                   365.25G
         status:                 normal




CellCLI> LIST METRICCURRENT WHERE objecttype='FLASHCACHE';
         FC_BYKEEP_OVERWR                FLASHCACHE      0.000 MB
         FC_BYKEEP_OVERWR_SEC            FLASHCACHE      0.000 MB/sec
         FC_BYKEEP_USED                  FLASHCACHE      8,350 MB
         FC_BY_USED                      FLASHCACHE      8,518 MB
         FC_IO_BYKEEP_R                  FLASHCACHE      8,328 MB
         FC_IO_BYKEEP_R_SEC              FLASHCACHE      0.000 MB/sec
         FC_IO_BYKEEP_W                  FLASHCACHE      8,201 MB
         FC_IO_BYKEEP_W_SEC              FLASHCACHE      0.000 MB/sec
         FC_IO_BY_R                      FLASHCACHE      8,700 MB
         FC_IO_BY_R_MISS                 FLASHCACHE      8,704 MB
         FC_IO_BY_R_MISS_SEC             FLASHCACHE      0.000 MB/sec
         FC_IO_BY_R_SEC                  FLASHCACHE      0.000 MB/sec
         FC_IO_BY_R_SKIP                 FLASHCACHE      69,824 MB
         FC_IO_BY_R_SKIP_SEC             FLASHCACHE      0.001 MB/sec
         FC_IO_BY_W                      FLASHCACHE      9,783 MB
         FC_IO_BY_W_SEC                  FLASHCACHE      0.000 MB/sec
         FC_IO_ERRS                      FLASHCACHE      0
         FC_IO_RQKEEP_R                  FLASHCACHE      8,340 IO requests
         FC_IO_RQKEEP_R_MISS             FLASHCACHE      8,340 IO requests
         FC_IO_RQKEEP_R_MISS_SEC         FLASHCACHE      0.0 IO/sec
         FC_IO_RQKEEP_R_SEC              FLASHCACHE      0.0 IO/sec
         FC_IO_RQKEEP_R_SKIP             FLASHCACHE      15 IO requests
         FC_IO_RQKEEP_R_SKIP_SEC         FLASHCACHE      0.0 IO/sec
         FC_IO_RQKEEP_W                  FLASHCACHE      8,343 IO requests
         FC_IO_RQKEEP_W_SEC              FLASHCACHE      0.0 IO/sec
         FC_IO_RQ_R                      FLASHCACHE      38,219 IO requests
         FC_IO_RQ_R_MISS                 FLASHCACHE      19,694 IO requests
         FC_IO_RQ_R_MISS_SEC             FLASHCACHE      0.0 IO/sec
         FC_IO_RQ_R_SEC                  FLASHCACHE      0.0 IO/sec
         FC_IO_RQ_R_SKIP                 FLASHCACHE      246,344 IO requests
         FC_IO_RQ_R_SKIP_SEC             FLASHCACHE      0.1 IO/sec
         FC_IO_RQ_W                      FLASHCACHE      137,932 IO requests
         FC_IO_RQ_W_SEC                  FLASHCACHE      0.0 IO/sec


列出度量定义

CellCLI> LIST METRICDEFINITION FC_BY_USED DETAIL
         name:                   FC_BY_USED
         description:            "Number of megabytes used on FlashCache"
         metricType:             Instantaneous
         objectType:             FLASHCACHE
         unit:                   MB


SQL> alter table larget storage (cell_flash_cache keep);

Table altered.

SQL> 
SQL> select a.name,b.value 
  2      from v$sysstat a , v$mystat b
  3    where
a.statistic#=b.statistic#
and (a.name in ('physical read total bytes','physical write total bytes',
'cell IO uncompressed bytes') or a.name like 'cell phy%'
or a.name like '%flash cache read hits');   4    5    6    7  

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical read total bytes                                            114688
physical write total bytes                                                0
cell physical IO interconnect bytes                                  114688
cell physical IO bytes pushed back due to excessive CPU on cell           0
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0
cell flash cache read hits                                                0

11 rows selected.

SQL> alter system flush buffer_cache;

System altered.

SQL> select count(*) from larget;

  COUNT(*)
----------
 242778112

SQL> set timing on;
SQL> select a.name,b.value 
  2      from v$sysstat a , v$mystat b
  3    where
a.statistic#=b.statistic#
and (a.name in ('physical read total bytes','physical write total bytes',
'cell IO uncompressed bytes') or a.name like 'cell phy%'
or a.name like '%flash cache read hits');   4    5    6    7  

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical read total bytes                                        2.6262E+10
physical write total bytes                                                0
cell physical IO interconnect bytes                              3018270928
cell physical IO bytes pushed back due to excessive CPU on cell           0
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload            2.6262E+10
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       3018090704
cell IO uncompressed bytes                                       2.6284E+10
cell flash cache read hits                                               55

11 rows selected.

Elapsed: 00:00:00.01
SQL> select count(*) from larget;

  COUNT(*)
----------
 242778112

Elapsed: 00:00:06.83
SQL> select a.name,b.value 
  2      from v$sysstat a , v$mystat b
  3    where
a.statistic#=b.statistic#
and (a.name in ('physical read total bytes','physical write total bytes',
'cell IO uncompressed bytes') or a.name like 'cell phy%'
or a.name like '%flash cache read hits');   4    5    6    7  

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical read total bytes                                        5.2525E+10
physical write total bytes                                                0
cell physical IO interconnect bytes                              6036394312
cell physical IO bytes pushed back due to excessive CPU on cell           0
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload            5.2524E+10
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       6036214088
cell IO uncompressed bytes                                       5.2570E+10
cell flash cache read hits                                            27999

11 rows selected.

Elapsed: 00:00:00.00

cell server IO calibrate 

ellCLI> calibrate force;
Calibration will take a few minutes...
Aggregate random read throughput across all hard disk LUNs: 1936 MBPS
Aggregate random read throughput across all flash disk LUNs: 4148.56 MBPS
Aggregate random read IOs per second (IOPS) across all hard disk LUNs: 4906
Aggregate random read IOs per second (IOPS) across all flash disk LUNs: 142303
Controller read throughput: 1939.98 MBPS
Calibrating hard disks (read only) ...
LUN 0_0  on drive [28:0     ] random read throughput: 168.39 MBPS, and 419 IOPS
LUN 0_1  on drive [28:1     ] random read throughput: 165.32 MBPS, and 412 IOPS
LUN 0_10 on drive [28:10    ] random read throughput: 170.72 MBPS, and 421 IOPS
LUN 0_11 on drive [28:11    ] random read throughput: 169.51 MBPS, and 412 IOPS
LUN 0_2  on drive [28:2     ] random read throughput: 171.15 MBPS, and 421 IOPS
LUN 0_3  on drive [28:3     ] random read throughput: 170.58 MBPS, and 413 IOPS
LUN 0_4  on drive [28:4     ] random read throughput: 166.37 MBPS, and 413 IOPS
LUN 0_5  on drive [28:5     ] random read throughput: 167.69 MBPS, and 424 IOPS
LUN 0_6  on drive [28:6     ] random read throughput: 171.89 MBPS, and 427 IOPS
LUN 0_7  on drive [28:7     ] random read throughput: 167.78 MBPS, and 425 IOPS
LUN 0_8  on drive [28:8     ] random read throughput: 170.74 MBPS, and 423 IOPS
LUN 0_9  on drive [28:9     ] random read throughput: 168.56 MBPS, and 420 IOPS
Calibrating flash disks (read only, note that writes will be significantly slower) ...
LUN 1_0  on drive [FLASH_1_0] random read throughput: 272.06 MBPS, and 19867 IOPS
LUN 1_1  on drive [FLASH_1_1] random read throughput: 272.06 MBPS, and 19892 IOPS
LUN 1_2  on drive [FLASH_1_2] random read throughput: 271.68 MBPS, and 19869 IOPS
LUN 1_3  on drive [FLASH_1_3] random read throughput: 272.40 MBPS, and 19875 IOPS
LUN 2_0  on drive [FLASH_2_0] random read throughput: 272.54 MBPS, and 20650 IOPS
LUN 2_1  on drive [FLASH_2_1] random read throughput: 272.67 MBPS, and 20683 IOPS
LUN 2_2  on drive [FLASH_2_2] random read throughput: 271.98 MBPS, and 20693 IOPS
LUN 2_3  on drive [FLASH_2_3] random read throughput: 272.48 MBPS, and 20683 IOPS
LUN 4_0  on drive [FLASH_4_0] random read throughput: 271.85 MBPS, and 19932 IOPS
LUN 4_1  on drive [FLASH_4_1] random read throughput: 272.22 MBPS, and 19924 IOPS
LUN 4_2  on drive [FLASH_4_2] random read throughput: 272.38 MBPS, and 19908 IOPS
LUN 4_3  on drive [FLASH_4_3] random read throughput: 271.73 MBPS, and 19901 IOPS
LUN 5_0  on drive [FLASH_5_0] random read throughput: 271.61 MBPS, and 19906 IOPS
LUN 5_1  on drive [FLASH_5_1] random read throughput: 271.39 MBPS, and 19897 IOPS
LUN 5_2  on drive [FLASH_5_2] random read throughput: 270.85 MBPS, and 19901 IOPS
LUN 5_3  on drive [FLASH_5_3] random read throughput: 270.99 MBPS, and 19884 IOPS
CALIBRATE results are within an acceptable range.
Calibration has finished.




SQL> Select data_object_id from dba_objects where  object_name='LARGET';

DATA_OBJECT_ID
--------------
         17425

SELECT statistic_name, value   
FROM V$SEGMENT_STATISTICS 
     WHERE dataobj#= 17425 AND ts#=7 AND
     statistic_name='optimized physical reads';

STATISTIC_NAME                                                        VALUE
---------------------------------------------------------------- ----------
optimized physical reads                                              43687



CellCLI> LIST FLASHCACHECONTENT where objectnumber=17425 detail
         cachedKeepSize:         8755838976
         cachedSize:             8757706752
         dbID:                   2080757153
         dbUniqueName:           DBM
         hitCount:               12940
         hoursToExpiration:      23
         missCount:              78488
         objectNumber:           17425
         tableSpaceNumber:       7

 

 

 

V$SYSSTAT视图中累计性地记录了从flash cache中获益的I/O request数目,这些累计数目来自于所有的CELL存储服务器, 相关的统计名字叫做’cell flash cache read hits’,相似的统计信息在V$SESSTAT和V$MYSTAT中都有。

另一个统计值‘physical read requests optimized’ 反映了Exadata storage index与cell flash cache一起获益的磁盘IO数目。

在11g的AWR报告中出现了新的段落来描述数据库对象和SQL分别体现的高和低的Smart flash cache命中率。这些段落是:
Segment by unoptimized reads
Segment by Optimized reads
SQL ordered by Physical Reads (Unoptimized)

在 AWR报告中I/O读取请求收益于Smart flash cache的被称作”Optimized reads”, 仅仅是从普通SAS DISK读取的称作”Unoptimized Reads”

 

Segments by UnOptimized Reads

  • Total UnOptimized Read Requests: 66,587
  • Captured Segments account for 86.9% of Total
Owner Tablespace Name Object Name Subobject Name Obj. Type UnOptimized Reads %Total
SYS SYSTEM AUD$ TABLE 38,376 57.63
PIN PIN02 PURCHASED_PRODUCT_T TABLE 5,149 7.73
PIN PINX02 I_PURCHASED_PRODUCT__ID INDEX 3,617 5.43
PIN PIN00 IDX_TRANS_LOG_MSISDN INDEX 2,471 3.71
PIN PIN02 BILLLOG_T P_R_02292012 TABLE PARTITION 1,227 1.84

Segments by Optimized Reads

  • Total Optimized Read Requests: 207,547
  • Captured Segments account for 88.9% of Total
Owner Tablespace Name Object Name Subobject Name Obj. Type Optimized Reads %Total
SYS SYSTEM AUD$ TABLE 92,198 44.42
PIN PIN02 PURCHASED_PRODUCT_T TABLE 23,142 11.15
PIN PINX02 I_PURCHASED_PRODUCT__ID INDEX 10,781 5.19
PIN PIN00 IDX_TRANS_LOG_MSISDN INDEX 9,354 4.51
PIN PIN02 SERVICE_T TABLE 7,818 3.77

 

Warning:Even Exadata has a wrong memlock setting

事情要从大约2个月前的一起事故说起,有一套部署在Oracle-Sun Exadata V2 Database Machine上的4节点11.2.0.1 RAC数据库,其中一个节点的RAC关键后台进程LMS报ORA-00600[kjbmprlst:shadow]错误,随后LMS后台进程将该节点上的实例终止。其他节点上的CRS软件检测到该意外终止后,数据库进入全局资源的重新配置过程(Reconfiguration),Reconfiguration在所有剩余节点上都顺利完成了。

但是随后其中一个节点的告警日志中持续出现”Process W000 died, see its trace file”,似乎是实例无法得到分配新进程的必要资源,同时应用程序出现无法登陆该节点上实例的情况,本来4节点的RAC数据库,因为ORA-00600挂了一个,现在又有一个节点登不上,一下变得只剩下一半性能。

随后我赶到了问题现场,继续诊断问题,并发现了以下症状,在此一一列举:

1.尝试远程登录该实例,但是失败,出现ORA-12516 TNS:listener could not find available handler with matching protocol stack”错误。反复登录会出现以下信息:

Linux Error: 12: Cannot allocate memory
 Additional information: 1
 ORA-01034: ORACLE not available

 

2.确认过ORACLE_SID、ORACLE_HOME等多环境变量后使用”sqlplus / as sysdba”登录却返回”Connected to an idle instance.”(这一点最为蹊跷),无法以sysdba登录就无法收集必要的诊断信息,这个虽然可以通过gdb等手段做systemstate dump,但是暂时绕过

 

3. 后台进程W000由SMCO进程启动, SMCO进程的日志如下,所报状态为KSOSP_SPAWNED:

Process W000 is dead (pid=2648 req_ver=3812 cur_ver=3812 state=KSOSP_SPAWNED).
 *** 2011-07-08 02:44:32.971
 Process W000 is dead (pid=2650 req_ver=3813 cur_ver=3813 state=KSOSP_SPAWNED).

 

4. 确认组成instance的内存和后台进程均存活,且仍有日志产生

[oracle@maclean04 trace]$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xac5ffd78 491524     oracle    660        4096       0
0x96c5992c 1409029    oracle    660        4096       0  

[oracle@maclean04 trace]$ ls -l /dev/shm
total 34839780
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_0
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_1
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_10
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_100
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_101
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_102
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_103
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_104
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_105
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_106
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_107
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_108
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_109
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_11
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_110
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_111
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_112
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_113
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_114
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_115
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_116
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_117
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_118
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_119
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_12
.......................

[oracle@maclean04 trace]$ ps -ef|grep ora_
oracle    5466     1  0 Jul03 ?        00:00:18 ora_pz99_maclean4
oracle   14842 10564  0 19:54 pts/9    00:00:00 grep ora_
oracle   18641     1  0 Jun08 ?        00:00:02 ora_q002_maclean4
oracle   23932     1  0 Jun07 ?        00:04:26 ora_pmon_maclean4
oracle   23934     1  0 Jun07 ?        00:00:06 ora_vktm_maclean4
oracle   23938     1  0 Jun07 ?        00:00:00 ora_gen0_maclean4
oracle   23940     1  0 Jun07 ?        00:00:06 ora_diag_maclean4
oracle   23942     1  0 Jun07 ?        00:00:00 ora_dbrm_maclean4
oracle   23944     1  0 Jun07 ?        00:01:01 ora_ping_maclean4
oracle   23946     1  0 Jun07 ?        00:00:16 ora_psp0_maclean4
oracle   23948     1  0 Jun07 ?        00:00:00 ora_acms_maclean4
oracle   23950     1  0 Jun07 ?        02:27:29 ora_dia0_maclean4
oracle   23952     1  0 Jun07 ?        01:19:42 ora_lmon_maclean4
oracle   23954     1  0 Jun07 ?        02:23:59 ora_lmd0_maclean4
oracle   23956     1  5 Jun07 ?        1-13:50:36 ora_lms0_maclean4
oracle   23960     1  4 Jun07 ?        1-12:44:25 ora_lms1_maclean4
oracle   23964     1  0 Jun07 ?        00:00:00 ora_rms0_maclean4
oracle   23966     1  0 Jun07 ?        00:00:00 ora_lmhb_maclean4
oracle   23968     1  0 Jun07 ?        01:58:35 ora_mman_maclean4
oracle   23970     1  0 Jun07 ?        06:28:39 ora_dbw0_maclean4
oracle   23972     1  0 Jun07 ?        06:27:08 ora_dbw1_maclean4
oracle   23974     1  2 Jun07 ?        16:49:56 ora_lgwr_maclean4
oracle   23976     1  0 Jun07 ?        00:20:48 ora_ckpt_maclean4
oracle   23978     1  0 Jun07 ?        00:07:03 ora_smon_maclean4
oracle   23980     1  0 Jun07 ?        00:00:00 ora_reco_maclean4
oracle   23982     1  0 Jun07 ?        00:00:00 ora_rbal_maclean4
oracle   23984     1  0 Jun07 ?        00:01:00 ora_asmb_maclean4
oracle   23986     1  0 Jun07 ?        00:08:15 ora_mmon_maclean4
oracle   23988     1  0 Jun07 ?        00:18:19 ora_mmnl_maclean4
oracle   23992     1  0 Jun07 ?        00:00:00 ora_d000_maclean4
oracle   23994     1  0 Jun07 ?        00:00:00 ora_s000_maclean4
oracle   23996     1  0 Jun07 ?        00:00:00 ora_mark_maclean4
oracle   24065     1  0 Jun07 ?        01:16:54 ora_lck0_maclean4
oracle   24067     1  0 Jun07 ?        00:00:00 ora_rsmn_maclean4
oracle   24079     1  0 Jun07 ?        00:01:02 ora_dskm_maclean4
oracle   24174     1  0 Jun07 ?        00:08:18 ora_arc0_maclean4
oracle   24188     1  0 Jun07 ?        00:08:19 ora_arc1_maclean4
oracle   24190     1  0 Jun07 ?        00:00:59 ora_arc2_maclean4
oracle   24192     1  0 Jun07 ?        00:08:12 ora_arc3_maclean4
oracle   24235     1  0 Jun07 ?        00:00:00 ora_gtx0_maclean4
oracle   24237     1  0 Jun07 ?        00:00:00 ora_rcbg_maclean4
oracle   24241     1  0 Jun07 ?        00:00:00 ora_qmnc_maclean4
oracle   24245     1  0 Jun07 ?        00:00:00 ora_q001_maclean4
oracle   24264     1  0 Jun07 ?        00:08:28 ora_cjq0_maclean4
oracle   25782     1  0 Jun07 ?        00:00:00 ora_smco_maclean4

 

5.确认在问题发生时系统中仍有大量的空闲内存且未发生大量的SWAP,此外/dev/shm共享内存目录仍有27G的空闲。

 

6.在其他节点上查询全局动态性能视图gv$resource_limit发现当前故障节点上的登录进程总数上限仅为404,并不多。

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> select * from gv$resource_limit where inst_id=4;

RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION             LIMIT_VALUE
------------------------------ ------------------- --------------- ------------------------------ -------------
processes                                       50             404       1500                           1500
sessions                                        61             616       2272                           2272
enqueue_locks                                  849            1599      31062                          31062
enqueue_resources                              846            1007      15016                      UNLIMITED
ges_procs                                       47             399       1503                           1503
ges_ress                                     65943          109281      67416                      UNLIMITED
ges_locks                                    23448           37966      92350                      UNLIMITED
ges_cache_ress                                7347           14716          0                      UNLIMITED
ges_reg_msgs                                   337            5040       3730                      UNLIMITED
ges_big_msgs                                    26             502       3730                      UNLIMITED
ges_rsv_msgs                                     0               1       1000                           1000
gcs_resources                              2008435         2876561    3446548                        3446548
gcs_shadows                                1888276         2392064    3446548                        3446548
dml_locks                                        0               0       9996                      UNLIMITED
temporary_table_locks                            0              45  UNLIMITED                      UNLIMITED
transactions                                     0               0       2499                      UNLIMITED
branches                                         0               2       2499                      UNLIMITED
cmtcallbk                                        0               3       2499                      UNLIMITED
max_rollback_segments                          109             129       2499                          65535
sort_segment_locks                               0              14  UNLIMITED                      UNLIMITED
k2q_locks                                        0               2       4544                      UNLIMITED
max_shared_servers                               1               1  UNLIMITED                      UNLIMITED
parallel_max_servers                             1              19        160                           3600

 

7. Exadata节点系统内核参数文件sysctl.conf中的配置正确:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

########### BEGIN DO NOT REMOVE Added by Oracle Exadata ###########
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# bug 8311668 file-max and aio-max-nr
fs.file-max = 6815744
# DB install guide says the above
fs.aio-max-nr = 1048576
# 8976963
net.ipv4.neigh.bond0.locktime=0
net.ipv4.ip_local_port_range = 9000 65500
# DB install guide says the above
net.core.rmem_default = 4194304
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 2097152
# The original DB deployment was net.core.wmem_max = 1048586 but IB works
# best for Exadata at the above net.core settings
# bug 8268393 remove vm.nr_hugepages = 2048
# bug 8778821 system reboots after 60 sec on panic
kernel.panic=60
########### END DO NOT REMOVE Added by Oracle Exadata ###########

########### BEGIN DO NOT REMOVE Added by Oracle Exadata ###########
kernel.shmmax = 64547735961
kernel.shmall = 15758724
########### END DO NOT REMOVE Added by Oracle Exadata ###########

 

8. 至此问题还是显得扑朔迷离,主要后台进程和SGA内存的完好,而且操作系统上也仍有大量空闲内存,实例上的资源也没有达到一个临界点。到底是什么造成了无法分配新进程!?

出于谨慎我最后还是检查了系统上的/etc/security/limits.conf参数文件,该参数文件控制了shell的一些ulimit的上限。因为Exadata一体机是由Oracle安装配置后直接交付使用的,我最初的认识是这些配置文件都毫无疑问都应当是最佳配置,遵循Oracle的Best Practices。

但是当我实际打开这个文件后我立即意识到这个配置有问题,似乎少了点什么,以下为该Exadata上的limits.conf文件:

########### BEGIN DO NOT REMOVE Added by Oracle Deployment Scripts ###########

oracle     soft    nproc       2047
oracle     hard    nproc       16384
oracle     soft    nofile      65536
oracle     hard    nofile      65536

########### END DO NOT REMOVE Added by Oracle Deployment Scripts ###########

显然上述limits.conf中缺少了对memlock参数的设置,在不设置memlock参数的情况下使用缺省的memlock为32,以下为Exadata host上的ulimit输出:

[oracle@maclean4 shm]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 606208
max locked memory (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2047
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

可以观察到这里的max locked memory确实是缺省的32,而Oracle所推荐的memlock参数却要远大于32。

在Oracle validated Configuration中经过验证的memlock推荐值为50000000,关于Oracle Validated Configuration详见拙作<Understand Oracle Validated Configurations>

[oracle@rh2 ~]$ cat /etc/security/limits.conf

# Oracle-Validated setting for nofile soft limit is 131072
oracle   soft   nofile    131072

# Oracle-Validated setting for nofile hard limit is 131072
oracle   hard   nofile    131072

# Oracle-Validated setting for nproc soft limit is 131072
oracle   soft   nproc    131072

# Oracle-Validated setting for nproc hard limit is 131072
oracle   hard   nproc    131072

# Oracle-Validated setting for core soft limit is unlimited
oracle   soft   core    unlimited

# Oracle-Validated setting for core hard limit is unlimited
oracle   hard   core    unlimited

# Oracle-Validated setting for memlock soft limit is 50000000
oracle   soft   memlock    50000000 

# Oracle-Validated setting for memlock hard limit is 50000000
oracle   hard   memlock    50000000

搜索Mos可以发现Note[Ora-27102: Out Of Memory: Linux Error: 12: Cannot Allocate Memory with LOCK_SGA=TRUE [ID 401077.1]:指出了因max locked memory过小可能引发Linux Error: 12: Cannot Allocate Memory内存无法分配的问题。

因为修改limits.conf配置文件对已经启动的实例是无效的,所以我们无法通过纠正参数来解决现有的问题。

实际我采用了释放一些资源的方法来workaround了这个问题,通过以下脚本将实例内的所有前台服务进程杀死以释放资源。

ps -ef|grep $SID|grep LOCAL=NO|grep -v grep| awk ‘{print $2}’|xargs kill -9

完成以上命令后出现了终端有点卡的现象,之后恢复正常。尝试使用sysdba本地和远程登录实例均成功,应用的链接也恢复正常。

虽然修复了问题,但是还需要和客户做详尽的说明。我在邮件中阐明了该Exadata一体机上配置文件存在的问题,并提出了几点建议:

1.要求Oracle Support确认该/etc/security/limits.conf中的配置是否合理,是否需要修改
2.设置vm.min_free_kbytes = 51200 内核参数,避免因空闲内存不足引起的性能问题
3.安装OSWatcher监控软件,监控必要的系统资源

客户对我的说法也比较信服,但还是将邮件抄送了原厂Exadata一体机的售前人员。

之后售前人员也曾联系过我,我也做了相同的说明。但原厂售前认为在Exadata一体机是在Oracle美国原厂进行配置安装的,在配置上肯定是最优的,而且该limits.conf中的memlock参数的当前值(32)和推荐值(50000000)之间有如此大的差距,他们认为美国原厂的部署人员不可能犯这么低级的错误。

所以实际他们对我对该memlock参数的说明持一种怀疑的态度,我的建议是就该memlock参数和MOS进行进一步的沟通,以确认该问题。当然这不是我需要完成的工作了。因为对该memlock参数存在分歧,所以短期内也没有修改该参数。

这个case就这样过去了,时间过得很快,转眼已经2个月了。恰巧最近有升级Exadata上数据库到11.2.0.2的项目,所以翻阅了相关patch的readme文档,因为升级RAC到11.2.0.2的前提是Exadata Storage Server Software、InfiniBand Switch Software Version软件版本能够兼容,所以查阅了其兼容列表:

Version Compatibility

The following table lists the Exadata Storage Server software versions that are compatible with each supported Oracle Database 11g Release 2 software version.

Oracle Database Software version Required Exadata Storage Server Software version
11g Release 2 (11.2.0.2.0) Patch Set 1 11.2.2.x
11g Release 2 (11.2.0.1.0) 11.2.2.x
11.2.1.x

The following table lists the InfiniBand Switch software versions that are compatible with each supported Exadata Storage Server software version.

Exadata Storage Server Software version Required InfiniBand Switch software version
11.2.2.2.2 and later Exadata Database Machine – Sun Datacenter InfiniBand Switch 36
Switch software version 1.1.3-2 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D
Switch software 5.1.1 build ID 872 (ISR 9024D-M only)
Switch firmware 1.0.0 or higher
11.2.2.2.0 or earlier Exadata Database Machine – Sun Datacenter InfiniBand Switch 36
Switch software version 1.0.1-1 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D
Switch software 5.1.1 build ID 872 (ISR 9024D-M only)
Switch firmware 1.0.0 or higher

 

为了将Exadata上的RAC数据库升级到11.2.0.2,首先要将Exadata Storage Server Software升级到11.2.2.x,Oracle官方目前推荐的版本是11.2.2.3.2。

所以随后我也翻阅了Exadata Storage Server Software 11.2.2.3.2 的update readme文档,即<Oracle Exadata Database Machine README for patch 12577723 (Support note 1323958.1)>。

该Patch的升级主要分成”Applying the Patch to Exadata Cells”和”Applying the Patch to the Database Server” 2个阶段,即不仅需要在Exadata Cell上实施补丁,还需要在Database节点上实施一个小补丁。

 

查看”Applying the Patch to the Database Server”章节可以发现存在这样一个步骤:

Repeat the following steps for each database host. If you are taking deployment-wide downtime for the patching, then these steps may be performed in parallel on all database hosts.

  1. Update the resource limits for the database and the grid users:
    Note:

    • This step does not apply if you have customized the values for your specific deployment and database requirements.
    WARNING:

    • Do not run this step if you have specific customized values in use for your deployment.
    1. Calculate 75% of the physical memory on the machine using the following command:.
      let -i x=($((`cat /proc/meminfo | grep 'MemTotal:' | awk '{print $2}'` * 3 / 4))); echo $x
    2. Edit the /etc/security/limits.conf file to update or add following limits for the database owner (orauser) and the grid infrastructure user (griduser). Your deployment may use the same operating system user for both and it may be named as oracle user. Adjust the following as needed.
      ########## BEGIN DO NOT REMOVE Added by Oracle ###########
      orauser     soft    core       unlimited
      orauser     hard    core       unlimited
      orauser     soft    nproc       131072
      orauser     hard    nproc       131072
      orauser     soft    nofile      131072
      orauser     hard    nofile      131072
      orauser     soft    memlock     <value of x from step 01.a>
      orauser     hard    memlock     <value of x from step 01.a>
      
      griduser     soft    core       unlimited
      griduser     hard    core       unlimited
      griduser     soft    nproc       131072
      griduser     hard    nproc       131072
      griduser     soft    nofile      131072
      griduser     hard    nofile      131072
      griduser     soft    memlock     <value of x from step 01.a>
      griduser     hard    memlock     <value of x from step 01.a>
      
      ########### END DO NOT REMOVE Added by Oracle ###########

 

以上可以看到在正式实施Patch to Database server前做了一个补救措施,那就是为oracle和grid用户添加memlock参数,这里的memlock参数是通过将/proc/meminfo中的MemTotal取75%获得,在<Exadata Server Hardware Details>中我列出了Exadata Database Host的一些硬件参数,其中总内存MemTotal一般为70GB(74027752 kB),换算过来74027752*75%=55520814,也就是说Oracle实际推荐在Exadata上使用的memlock参数应当为55520814,甚至要高于我之前所说的50000000的验证值。

至此该问题终于真相大白!而我们也可以从中学到很多东西:

1.首先我大胆的猜测,实际部署Sun Exadata Machine的因该是Oracle硬件部门,也就是以前Sun的部门。实际在部署过程中,部门与部门之间的充分交流是很重要的,而这里09年匆匆上线的Oracle-Sun Exadata V2显然没有做好,而直到2011 5月发布的Oracle Exadata Database Machine 11g Release 2 (11.2) 11.2.2.3.2 patch 12577723中才反应并解决了该问题

2.IT始终是以人为本,不管是多么高端的服务器、多么先进的技术,如果没有与之相匹配的人和团队来驾驭的话,那么至多只能发挥出50%的效益,在人员对先进技术极端不熟悉的情况下,智能化只是空谈!

Exadata Sun Infiniband初识

Exadata infiniband的默认root密码一般是welcome1:

查看infiniband的版本:

 

# nm2version
(Note: The command ‘nm2version’ is deprecated and will be removed from future
releases. Please use ‘version’ instead)

SUN DCS 36p version: 1.3.3-2
Build time: Apr 4 2011 11:15:19
SP board info:
Manufacturing Date: 2010.04.24
Serial Number: “XXXXXX”
Hardware Revision: 0x0005
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010

 

NTP设置:

 

[root@dmibsw01 ~]# grep server  /etc/ntp.conf

 

[root@dmibsw01 ~]# ibswitches
Switch : 0x0021286cc7e2a0a0 ports 36 “SUN DCS 36P QDR dmibsw01uc.com” enhanced port 0 lid 1 lmc 0
Switch : 0x002128b7ac44c0a0 ports 36 “SUN IB QDR GW switch ibsw03 12.33.22.253” enhanced port 0 lid 63 lmc 0
Switch : 0x002128b7f744c0a0 ports 36 “SUN IB QDR GW switch ibsw02 12.33.22.252” enhanced port 0 lid 64 lmc 0
Switch : 0x0021284692d4a0a0 ports 36 “SUN DCS 36P QDR dmibsw02uc.com” enhanced port 0 lid 6 lmc 0
Switch : 0x00212846902ba0a0 ports 36 “SUN DCS 36P QDR dmibsw03uc.com” enhanced port 0 lid 3 lmc 0
Switch : 0x002128469c74a0a0 ports 36 “SUN DCS 36P QDR acslogicibsw01 12.33.22.251” enhanced port 0 lid 72 lmc 0

ibdiagnet -c 1000
Loading IBDIAGNET from: /usr/lib/ibdiagnet1.2
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib/ibdm1.2
-I- Using port 0 as the local port.
-I- Discovering … 39 nodes (6 Switches & 33 CA-s) discovered.
-I—————————————————
-I- Bad Guids/LIDs Info
-I—————————————————
-I- skip option set. no report will be issued

-I—————————————————
-I- Links With Logical State = INIT
-I—————————————————
-I- No bad Links (with logical state = INIT) were found

-I—————————————————
-I- PM Counters Info
-I—————————————————

查看Exadata的版本

可以通过imagehistory和imageinfo2个命令查看Exadata的dbserver和cellnode的版本, 例如:

 

DBSERVER:

[root@db01 ~]# imagehistory
Version : 11.2.2.4.2.111221
Image activation date : 2012-08-09 15:08:29 -0400
Imaging mode : fresh
Imaging status : success

Version : 11.2.3.1.1.120607
Image activation date : 2012-08-14 19:16:01 -0400
Imaging mode : patch
Imaging status : success

[root@db01 ~]# imageinfo

Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
Image version: 11.2.3.1.1.120607
Image activated: 2012-08-14 19:16:01 -0400
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1
[root@cel01 ~]# imagehistory
Version : 11.2.2.4.2.111221
Image activation date : 2012-08-09 15:36:25 -0400
Imaging mode : fresh
Imaging status : success

Version : 11.2.3.1.1.120607
Image activation date : 2012-08-13 18:00:09 -0400
Imaging mode : out of partition upgrade
Imaging status : success

[root@cel01 ~]# imageinfo

Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
Cell version: OSS_11.2.3.1.1_LINUX.X64_120607
Cell rpm version: cell-11.2.3.1.1_LINUX.X64_120607-1

Active image version: 11.2.3.1.1.120607
Active image activated: 2012-08-13 18:00:09 -0400
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.1.1.120607

Inactive image version: 11.2.2.4.2.111221
Inactive image activated: 2012-08-09 15:36:25 -0400
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

Boot area has rollback archive for the version: 11.2.2.4.2.111221
Rollback to the inactive partitions: Possible

 

 

 

Exadata上的DB版本诊断,还可以参考下面的脚本:

 

 

{ case `uname` in \
Linux ) ILOM="/usr/bin/ipmitool sunoem cli" ;; \
SunOS ) ILOM="/opt/ipmitool/bin/ipmitool sunoem cli" ;; \
esac ; ImageInfo="/opt/oracle.cellos/imageinfo" ; \
uname -srm ; head -1 /etc/*release ; uptime | cut -d, -f1 ; \
$ILOM "show /SP system_description system_identifier" | grep = ; \
$ImageInfo -activated -node -status -ver | grep -v ^$ ; \
} | tee /tmp/ExaInfo.log

Next, as oracle, run: 
$GRID_HOME/OPatch/opatch lsinv -all -oh $GRID_HOME | tee /tmp/OPatchInv.log
$ORACLE_HOME/OPatch/opatch lsinv -all | tee -a /tmp/OPatchInv.log

 

 

Linux 2.6.18-274.18.1.0.1.el5 x86_64
==> /etc/enterprise-release <==
Enterprise Linux Enterprise Linux Server release 5.7 (Carthage)

==> /etc/oracle-release <==
Oracle Linux Server release 5.7

==> /etc/redhat-release <==
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
00:45:26 up 16 days
system_description = SUN FIRE X4270 M2 SERVER, ILOM v3.0.16.10, r65138
system_identifier = Exadata Database Machine X2-2 AK00012260
Active image version: 11.2.3.1.1.120607
Active image activated: 2012-08-13 18:00:09 -0400
Active image status: success
Active node type: STORAGE
Inactive image version: 11.2.2.4.2.111221
Inactive image activated: 2012-08-09 15:36:25 -0400
Inactive image status: success
Inactive node type: STORAGE

cell statistics gather等待事件

This is an Exadata wait event that occurs when a session is reading information from V$CELL, V$CELL_THREAD_HISTORY and other assocociated views/tables in this same category.

Solutions

This is not a very common wait event so contact Oracle Support if there are excessive waits on this event in your system.

cell smart table scan等待事件

cell smart table scan

This is an Exadata wait event typically seen during full table scans that have been offloaded to the storage cells. This event replaces waits on “direct path read” in many cases. As with direct path reads, data is returned directly to the PGA rather than going through the buffer cache. When the storage cells process full table scans they can apply columns filters and perform column projection so that not all blocks are returned, only the ones that are needed.

Solutions

This event indicates that a full table scan is being performed. In some cases this could be faster than an index lookup, but is not a replacement for query tuning. If the query will return a small subset of the data, utilizing an index may be more efficient. Test the differences to understand any performance penalties incurred by doing a smart table scan vs. the index lookup.

Also, ensure the smart table scan is being done effectively by reviewing the Cell Smart Table Scan Latency metric on the Exadata tab under Resources in Ignite. The Objects tab in Ignite will also show response time information by object. This is critical for understanding which table is causing the majority of wait times.

cell smart restore from backup等待事件

This is an Exadata wait event that occurs when doing a restore via RMAN. Exadata automatically offloads RMAN restore activities to the storage cells.

Solutions

Offloading the RMAN restore activity to the storage cells within an Exadata machine is probably a good thing, so there may no tuning required. If RMAN restores are affecting performance, scheduling them at different times may be appropriate.

沪ICP备14014813号-2

沪公网安备 31010802001379号