Unix/Linux下删除Oracle控制文件Controlfile为什么实例Instance没有立即奔溃?

绝大多数教科书或网络教材都告诉初学者, Oracle中丢失Controlfile控制文件会造成不可挽回的损失, Oracle Instance实例会立即崩溃crash。

但是实际很多同学在实际操作时发现,在Unix/Linux上删除所有的Controlfile控制文件 后 实例并不会在第一时间crash, 相反居然还可以顺利完成FULL checkpoint 。

 

这是为什么呢?

 

注意这个问题仅适用于Unix/Linux, 在Windows平台上不允许删除 已经被其他进程打开的文件。

 

究其根本原因是在Linux/Unix上Read 或 Write一个文件,进程都会打开一个与此文件相关联的 打开文件描述符 Open File Descriptors (a file descriptor (FD) is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems.In POSIX, a file descriptor is an integer, specifically of the C type int.)

在C语言里 这样的一个File Descriptor以整形的形式出现, 举例来说:

 

进程A 在运行过程中 用 open 函数打开了一个文件/etc/hosts:

 

int  open("/etc/hosts",O_RDONLY);      

此时若open成功则会返回一个 文件描述符, 例如是该进程打开的第一个文件 那么 就返回1
若出错则返回 -1

之后进程A 就可以利用如 open ,write 这样的读写函数 要求对 文件描述符 1相关的实际文件进行读写

当进程不再需要读写文件或 进程退出时 一般使用close函数关闭open File Descriptor

close(1);   -- 例如上面获得的 描述符是 1  ,这里代入1就可以关闭本进程对/etc/hosts的FD

 

 

而在Linux/Unix上当一个文件被删除时,它只是简单地被”unlink”即删除一个硬链接hard link; 实际包含  文件数据的inode 在所有与该文件关联的 打开文件描述符(Open File Descriptors)被关闭之前 都不会被实际删除, 已经获得 打开文件描述符的进程 只要不用close函数关闭掉这些描述符, 都可以继续正常地对文件进行读写。 当所有该文件 管理的 打开文件符都被 关闭时 文件才真正意义上被删除。

 

 “When a file is deleted in linux, it is simply ‘unlinked’. The inode, which contains the file’s data, is not deleted until all processes have finished with it. This is why processes can carry on writing to deleted files.

In Unix, when you remove a file, the file isn’t actually deleted until all open file descriptors to it are closed.”

 

我们来实际还原一下 这个问题现场:

 

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn 

SQL> show parameter control_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      /s01/oradata/G10R21/controlfil
                                                 e/o1_mf_7ch80zvk_.ctl

SQL> select open_mode from v$database;

OPEN_MODE
----------
READ WRITE

 

以上环境版本为10.2.0.1,只使用一个控制文件, 数据库已打开到READ WRITE MODE。

 

我们找到 Oracle中持续对Controlfile 每3 s做一次 heartbeat心跳的CKPT checkpoint进程的 SPID ,并dump其进程信息:

 

[oracle@vrh8 ~]$ ps -ef|grep ckpt|grep -v grep
oracle    3617     1  0 21:35 ?        00:00:00 ora_ckpt_G10R21

SQL> oradebug setospid 3617;
Oracle pid: 7, Unix process pid: 3617, image: oracle@vrh8.oracle.com (CKPT)

SQL> oradebug dump errorstack 4;
Statement processed.

SQL> oradebug tracefile_name;
/s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc

SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

 

检查ckpt进程的errorstack dump可以发现一些有趣的内容

 

[oracle@vrh8 ~]$ vi /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc

Process global information:
     process: 0x84e58dd0, call: 0x84f96b18, xact: (nil), curses: 0x84f785b0, usrses: 0x84f785b0
  ----------------------------------------
  SO: 0x84e58dd0, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=7, calls cur/top: 0x84f96b18/0x84f96b18, flag: (6) SYSTEM
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 24
              last post received-location: ksasnd
              last process to post me: 84e5cd10 5 0
              last post sent: 0 0 25
              last post sent-location: ksasnr
              last process posted by me: 84e5cd10 5 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x84ea05f8
    O/S info: user: oracle, term: UNKNOWN, ospid: 3617
    OSD pid info: Unix process pid: 3617, image: oracle@vrh8.oracle.com (CKPT)

    (FOB) flags=2 fib=0x83995e10 incno=0 pending i/o cnt=0
     fname=/s01/oradata/G10R21/datafile/o1_mf_system_7ch8004t_.dbf
     fno=1 lblksz=4 fsiz=86309888
    ----------------------------------------
    SO: 0x83995a70, type: 6, owner: 0x84e58dd0, flag: INIT/-/-/0x00
    (FIB) flags=0 reference cnt=1 incno=0 seqno=1
     fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl
     fno=0 lblksz=4 fsiz=86309888
      (FOB) flags=393226 fib=0x83995a70 incno=0 pending i/o cnt=0
 fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl
       fno=0 lblksz=6 fsiz=86309888

 

 

可以发现这里存在一个FOB 对象其 fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl ==>指向 控制文件

 

在Linux下我们可以通过 proc文件系统了解CKPT进程当前所有 已打开的文件描述符(FD):

 

[oracle@vrh8 ~]$ cd /proc/3617/fd

[oracle@vrh8 fd]$ ls -l
total 0
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 0 -> /dev/null
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 1 -> /dev/null
l-wx------ 1 oracle oinstall 64 Mar  3 21:44 10 -> /s01/admin/G10R21/bdump/alert_G10R21.log
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 11 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 12 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 13 -> /s01/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 14 -> /s01/oracle/product/10.2.0/db_1/dbs/hc_G10R21.dat
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 15 -> /s01/oracle/product/10.2.0/db_1/dbs/lkG10R21
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 17 -> /s01/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 18 -> /s01/oradata/G10R21/datafile/o1_mf_system_7ch8004t_.dbf
l-wx------ 1 oracle oinstall 64 Mar  3 21:44 2 -> /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Mar  3 21:44 4 -> /dev/null
l-wx------ 1 oracle oinstall 64 Mar  3 21:44 5 -> /s01/admin/G10R21/udump/g10r21_ora_3605.trc
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 6 -> /s01/oracle/product/10.2.0/db_1/dbs/lkinstG10R21 (deleted)
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 7 -> /s01/admin/G10R21/adump/ora_3605.aud
l-wx------ 1 oracle oinstall 64 Mar  3 21:44 8 -> /s01/admin/G10R21/bdump/alert_G10R21.log
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 9 -> /s01/oracle/product/10.2.0/db_1/dbs/hc_G10R21.dat

 

可以看到 o1_mf_7ch80zvk_.ctl 这个控制文件的 FD number是 16

 

我们开始利用strace 追踪 CKPT的 SYSTEM CALL , 并尝试删除唯一的一个控制文件

 

[root@vrh8 yum.repos.d]# strace -o ckpt.log -p 3617
Process 3617 attached - interrupt to quit

[oracle@vrh8 fd]$ rm /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl

oracle@vrh8 fd]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Mar 3 21:50:29 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> alter system checkpoint;

System altered.

SQL> /

System altered.

 

以上可以看到”ALTER SYSTEM CHECKPOINT” 正常完成 ,说明此时做FULL CHECKPOINT完全检查点也没问题。

 

此时用 lsof 命令可以 观察到有那几个进程已打开了o1_mf_7ch80zvk_.ctl这个控制文件这个控制文件的 文件描述符:

 

[root@vrh8 ~]# lsof|grep o1_mf_7ch80zvk_.ctl

oracle    3613    oracle   16u      REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)
oracle    3615    oracle   16u      REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)
oracle    3617    oracle   16uW     REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)
oracle    3625    oracle   19u      REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)
oracle    3644    oracle   19u      REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)
oracle    3646    oracle   19u      REG              253,0    12402688    6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)

[oracle@vrh8 ~]$ cd /proc/3617/fd

[oracle@vrh8 fd]$ ls -l 16
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)

可以看到(deleted)的标记,说明该文件已经被unlinked,但是其数据仍完好

 

此时我们甚至还可以通过DD 命令将 已被”unlinked”的 控制文件的内容复制出来, 这也是很多 意外删除场景中 实例未crash或重启时一种常用的恢复手段, 很多朋友可能觉得这个方法不可思议, 其实只要了解了原理就可以明白它的局限性。

 

[oracle@vrh8 ~]$ cd /proc/3617/fd

[oracle@vrh8 fd]$ ls -l 16
lrwx------ 1 oracle oinstall 64 Mar  3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted)

[oracle@vrh8 fd]$ dd if=16  of=/tmp/controlfile
24224+0 records in
24224+0 records out

12402688 bytes (12 MB) copied, 0.0592615 seconds, 209 MB/s

[oracle@vrh8 fd]$ file /tmp/controlfile
/tmp/controlfile: data

 

我们通过之前一直在做的strace log 可以看到 CKPT仍正常地针对控制文件Controlfiles做 3s一次的heartbeat:

 

gettimeofday({1330830450, 587535}, NULL) = 0
times(NULL)                             = 429642464
pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4\211\273\0\0\2\0\0\0\0\0\0\0\f\0\0\0"..., 16384, 49152) = 16384
times(NULL)                             = 429642464
gettimeofday({1330830450, 588689}, NULL) = 0
gettimeofday({1330830450, 588905}, NULL) = 0

 

 

这里肯定有同学要问了: ” 那么到底何时Oracle的后台进程如CKPT才会释放OPEN FD 打开文件描述符 呢?”

 

答案是:” 视乎后台进程的实际需求,仅在需要时Close Open Descriptor; 实际上close Open Descriptor的机会并不多,甚至于在我们的测试里半个小时内都没有出现close , CKPT运行良好”。

 

实际上我们可以手动促成 后台进程释放指定的一个FD, 注意不要在产品数据库环境中做以下操作, 这仅仅是为了研究:

之前我们已经知道了在CKPT进程中 o1_mf_7ch80zvk_.ctl 控制文件的FD number是 16 , 而CKPT的SPID操作系统进程号是3617;

 

SQL> oradebug setospid 3617;
Oracle pid: 7, Unix process pid: 3617, image: oracle@vrh8.oracle.com (CKPT)

SQL> oradebug call close 16;
Function returned 0

 

以上使用oradebug call在CKPT进程中强制调用close(16),将关闭16 号文件描述符; 关闭后CKPT无法再使用原有的 16 FD,也没有办法再open已经被unlinked的 控制文件, 可以在 strace 日志和 ckpt trace 及 alert告警日志中看到 一连串反应:

 

ckpt strace :

close(16)                               = 0
gettimeofday({1330830800, 213316}, NULL) = 0
times(NULL)                             = 429677421
pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4L\273\0\0\2\0\0\0\0\0\0\0\v\0\0\0"..., 16384, 49152) = -1 EBADF (Bad file descriptor)
gettimeofday({1330830800, 213988}, NULL) = 0
==========================================================================================

之后ckpt 会终止实例 , kill掉其他后台进程

times(NULL)                             = 429677424
open("/proc/3670/stat", O_RDONLY)       = 16
read(16, "3670 (oracle) S 1 3670 3670 0 -1"..., 999) = 239
close(16)                               = 0
kill(3670, SIGKILL)                     = 0
kill(3670, SIGCONT)                     = -1 ESRCH (No such process)
open("/proc/3673/stat", O_RDONLY)       = 16
read(16, "3673 (oracle) S 1 3673 3673 0 -1"..., 999) = 239
close(16)                               = 0
kill(3673, SIGKILL)                     = 0
kill(3673, SIGCONT)                     = -1 ESRCH (No such process)
open("/proc/3650/stat", O_RDONLY)       = 16
read(16, "3650 (oracle) S 1 3650 3650 0 -1"..., 999) = 239
..................................................................
open("/s01/admin/G10R21/bdump/alert_G10R21.log", O_WRONLY|O_CREAT|O_APPEND, 0664) = 8
writev(8, [{"Instance terminated by CKPT, pid"..., 39}, {"\n", 1}], 2) = 40
semctl(262145, 0, IPC_RMID, 0)          = 0
close(17)                               = 0
close(13)                               = 0
shmctl(2129924, IPC_RMID, 0)            = 0
shmdt(0x60000000)                       = 0
open("/proc/3617/stat", O_RDONLY)       = 13
read(13, "3617 (oracle) R 1 3617 3617 0 -1"..., 999) = 224
close(13)                               = 0
kill(3617, SIGKILL 

ckpt trace:

*** 2012-03-03 22:13:17.711
Received ORADEBUG command 'call close 16' from process Unix process pid: 3914, image:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
error 221 detected in background process
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1

alert log:

Errors in file /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
Sat Mar  3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
Sat Mar  3 22:13:20 2012
CKPT: terminating instance due to error 221
Sat Mar  3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_psp0_3609.trc:
ORA-00221: error on write to control file
Sat Mar  3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_mman_3611.trc:
ORA-00221: error on write to control file
Instance terminated by CKPT, pid = 3617

 

 

这个例子告诉我们 oracle数据库实例虽然很复杂, 有时候甚至”神奇” , 但是在Unix上它本质上和其他简单如shell的程序一样是作为用户向OS 申请资源,处理IO的。 要充分理解Oracle为什么这样设计,在运行时又为什么那样处理, 那么肯定要对Unix环境的原理有所了解。 包括: 文件I/O 、进程控制和环境、信号、进程间通信等等知识。 我本人推荐在学习Oracle之余 可以对Unix上的C语言编程有所涉猎, 推荐大家读一读 Richard Stevens的《Unix环境高级编程(第2版)》,这本书教会了我 Unix上的思考方式。

Comments

  1. comment test here !

  2. saup007 says

    测试库上不小心,把所有文件,全删除了,lsof都找回来了,Orz~!

  3. pengzhi says

    如果是CONTROLFILE 放在RAW呢?

  4. icerieve says

    你好!AIX 下进行数据文件删除,经过测试lsof不好使,类似的恢复方法有吗?谢谢

icerieve进行回复 取消回复

*

沪ICP备14014813号-2

沪公网安备 31010802001379号