如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!
诗檀软件专业数据库修复团队
服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com
本文中,我会解释我关于ASM磁盘头块的最后几个作品之一。
首先,我会在Oracle数据库中新建TEST表空间。TEST的数据文件由ASM进行管理。
[oracle@oel ~]$ . oraenv
ORACLE_SID = [+ASM] ? orcl
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@oel ~]$ sqlplus / as sysdba
SQL> create tablespace test datafile ‘+data’ size 5M autoextend on maxsize unlimited;
Tablespace created.
SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
orcl
SQL> select file_id,tablespace_name,file_name,status,online_status from dba_data_files
2 /
FILE_ID TABLESPACE_NAME FILE_NAME STATUS ONLINE_
———- —————————— ————————————————– ——— ——-
4 USERS /u02/oradata/orcl/users01.dbf AVAILABLE ONLINE
3 UNDOTBS1 /u02/oradata/orcl/undotbs01.dbf AVAILABLE ONLINE
2 SYSAUX /u02/oradata/orcl/sysaux01.dbf AVAILABLE ONLINE
1 SYSTEM /u02/oradata/orcl/system01.dbf AVAILABLE SYSTEM
5 EXAMPLE /u02/oradata/orcl/example01.dbf AVAILABLE ONLINE
6 TEST +DATA/orcl/datafile/test.258.798578613 AVAILABLE ONLINE
6 rows selected.
我的ASM实例管理两个磁盘组DATA(外部冗余)和RL(正常冗余)。
DATA是由ASMLib管理的2个磁盘组成的磁盘组
SQL> select group_number,name,type,state from v$asm_diskgroup
2 /
GROUP_NUMBER NAME TYPE STATE
———— —- —— ———–
1 DATA EXTERN MOUNTED
2 RL NORMAL MOUNTED
SQL> select path from v$asm_disk where group_number=1;
PATH
———————————
ORCL:ASM1
ORCL:ASM2
现在,我们清除每个磁盘的头块。头块就是ASM磁盘设备中的第一个,其默认大小为4096字节(该尺寸在_asm_blksize未公开参数中可用)。
[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.2e-05 seconds, 186 MB/s
[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM2 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000336 seconds, 12.2 MB/s
即使块头遭破坏,我还可以写在磁盘组中,并在其中分配区段。
SQL> alter database datafile 6 resize 10M;
Database altered.
SQL> create table t tablespace TEST as select * from dba_source;
Table created.
SQL> alter system checkpoint;
System altered.
SQL> select file#,checkpoint_time from v$datafile_header where file#=6;
FILE# CHECKPOINT_TIME
———- ——————-
6 05/11/2012 19:27:43
好,我们尝试重启rdbms实例:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1068937216 bytes
Fixed Size 2235208 bytes
Variable Size 377488568 bytes
Database Buffers 683671552 bytes
Redo Buffers 5541888 bytes
Database mounted.
Database opened.
SQL> select count(*) from t;
COUNT(*)
———-
702070
我仍然可以重启它并读取所有文件区。
这个演示中有点奇怪,ASM磁盘头无效但我仍然可以读取文件。我会用kfed验证我的头状态(下面仅展示了一个ASM磁盘):
oracle@oel ~]$ kfed read /dev/oracleasm/disks/ASM1
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2B469E002400 00000000 00000000 00000000 00000000 […………….]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
[oracle@oel ~]$ sudo /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/kfed read /dev/sdb1
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2B413559C400 00000000 00000000 00000000 00000000 […………….]
Repeat 255 times
KFED-00322: file not found; arguments: [kfbtTraverseBlock] [Invalid OSM block type] [] [0]
KFED已经确认我的头块无效。现在,我会尝试卸载并挂载磁盘组,看对磁盘组上的安装运行是否有任何影响。
[oracle@oel ~]$ sqlplus / as sysasm
SQL> alter diskgroup DATA dismount;
Diskgroup altered.
SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATA”
所以,ASM磁盘头对于安装磁盘组似乎很重要,但对ASM扩展区分配等没有影响。
现在,我要用kfed和和修复操作来修复ASM磁盘,以复原ASM头块:
[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM1
[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM2
[oracle@oel ~]$ sqlplus / as sysasm
SQL> alter diskgroup DATA mount;
Diskgroup altered.
所以KFED 能够复原头块(用修复操作).
[oracle@oel ~]$ . oraenv
ORACLE_SID = [+ASM] ? orcl
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@oel ~]$ sqlplus / as sysdba
Connected to an idle instance.
SQL> startup
…
Database opened.
SQL> select count(*) from T;
COUNT(*)
———-
702070
T表中的数据依然存在……没问题!
好,我们来总结一下……抹去我的磁盘组中两个ASM磁盘的头块,对文件的扩展区分配没有任何影响。唯一的影响是在挂载能力上。接下来,我用KFED修复操作复原了每个磁盘的头块……是的,但没有磁盘或磁盘头的任何备份。所以我想..我头块的备份在哪里?
首先,我要尝试一个很简单的方法。我会看着ASM磁盘设备中的所有元数据块。希望找到另一个头块。为此,我会使用一个基本的shell脚本来分析每个块。如果该脚本中找到一个KFBTYP_DISKHEAD类型的块,它将保持在块位置,并且在脚本执行的最后打印出来:
基本的脚本:
#!/bin/bash
#set -x
export ORAENV_ASK=NO
export ORACLE_SID=+ASM
. oraenv
i=0
ret=0
tab_cnt=0;
blk_typ=”;
while [ “$blk_typ” != “KFBTYP_INVALID” ];
do
blk_typ=`kfed read $1 blkn=$i | grep kfbh.type | awk ‘{print $5;}’ | sed ‘s/^ *//g’ | sed ‘s/ *$//g’ `
ret=$?
if [ “$blk_typ” = “KFBTYP_DISKHEAD” ]; then
t[$i]=$i
fi
let i=$i+1
done
echo “list of header block with KFBTYP_DISKHEAD type”
echo ${t[@]}
执行结果:
[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM1
The Oracle base remains unchanged with value /u01/app/oracle
list of header block with KFBTYP_DISKHEAD type
0 510
[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM2
The Oracle base remains unchanged with value /u01/app/oracle
list of header block with KFBTYP_DISKHEAD type
0 510
所以在我的磁盘上第510块有头块的副本。事实上,正如Bane Radulović 在博客中提到的,ASM磁盘头的备份副本位于分配单位1的倒数第二块。所以,如果我的AU为1MB,块大小为4096字节,头块的副本将在((1048576/4096)* 2- 1 =511)中可用,由于块#从0开始,它位于第510块。
要仔细检查这一点,我会清除第一个ASM磁盘的头块,并使用KFED修复和检查,以查看到底发生了什么:
[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.3e-05 seconds, 178 MB/s
[oracle@oel ~]$ strace kfed repair /dev/oracleasm/disks/ASM1
…/…
stat(“/dev/oracleasm/disks/ASM1”, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), …}) = 0
access(“/dev/oracleasm/disks/ASM1”, F_OK) = 0
statfs(“/dev/oracleasm/disks/ASM1”, {f_type=0x958459f6, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open(“/dev/oracleasm/disks/ASM1”, O_RDWR) = 7
lseek(7, 2088960, SEEK_SET) = 2088960
read(7, “\1\202\1\1\376\200<\206\371\7″…, 4096) = 4096
lseek(7, 0, SEEK_SET) = 0
read(7, “”…, 4096) = 4096
lseek(7, 0, SEEK_SET) = 0
write(7, “\1\202\1\1\200\302\206\371\7″…, 4096) = 4096
close(7) = 0
这很有意思.
第一次运行,打开设备,在2088960字节后读取块。但是 2088960
/4096=510,所以它读取盘的第510块。接下来,它在位置0(头块)读取块,并把它写入到第510块的内容。
现在我知道在秘密的位置每个头块都有副本! (块#510),Kfed使用该块修复主头块。
测试期间我注意到,有时ASM磁盘失去了ASM标签(不知道为什么)。因此,修复的磁盘不会再用oracleasm scandiskd重建(或重新启动后)。
[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM1 /dev/oracleasm/disks/ASM2 /dev/oracleasm/disks/ASM3 /dev/oracleasm/disks/ASM4
[root@oel ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Cleaning disk “ASM1”
Scanning system for ASM disks…
[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM2 /dev/oracleasm/disks/ASM3 /dev/oracleasm/disks/ASM4
既然知道哪些设备与asmlib磁盘设备相对应,就可以用oracleasm renamedisk命令重新标记该磁盘(使用该命令一定要小心):
[root@oel ~]# oracleasm renamedisk -f /dev/sdb1 ASM1
Writing disk header: done
Instantiating disk “ASM1”: done
[root@oel ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Scanning system for ASM disks…
[root@oel ~]# ls /dev/oracleasm/disks/*
/dev/oracleasm/disks/ASM1 /dev/oracleasm/disks/ASM2 /dev/oracleasm/disks/ASM3 /dev/oracleasm/disks/ASM4
Comment