Oracle等待事件kfk:async disk IO

kfk: async disk IO等待事件是ASM下异步的System I/O等待事件,kfk内核层面在disk_asynch_io=true时被激活。当rbal或其他ASM相关后台进程在维护ASM磁盘组时可能进入kfk: async disk IO等待。

SQL> col name for a20
SQL> col PARAMETER1 for a10
SQL> col PARAMETER2 for a10
SQL> col PARAMETER3 for a10
SQL> col WAIT_CLASS for a15

SQL> select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name='kfk: async disk IO';

NAME                 PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
-------------------- ---------- ---------- ---------- ---------------
kfk: async disk IO   count      intr       timeout    System I/O

SQL> select * from v$version;    

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select name,value from v$system_parameter where name in ('instance_type','asm_power_limit');

NAME                 VALUE
-------------------- ----------
instance_type        asm
asm_power_limit      10

SQL> conn / as sysasm
Connected.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.
SQL> alter diskgroup data check all;

Diskgroup altered.

SQL> oradebug event 10046 trace name context off;
Statement processed.

SQL> oradebug tracefile_name;
/s01/orabase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_29405.trc

=====================trace=====================
PARSING IN CURSOR #140442102181424 len=30 dep=0 uid=0 oct=193 
lid=0 tim=1313673029551496 hv=2849532521 ad='6bd58b50' sqlid='ft5h7dunxhum9'
alter diskgroup data check all
END OF STMT
PARSE #140442102181424:c=1999,e=14171,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1313673029551493
WAIT #140442102181424: nam='Disk file operations I/O' ela= 573 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='Disk file operations I/O' ela= 33 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='Disk file operations I/O' ela= 29 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='kfk: async disk IO' ela= 941 count=1 intr=0 timeout=4294967295 obj#=-1

fdp_checkDsk(): 20
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1610 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029798048
kfdp_checkDsk(): 21
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1677 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029885645
kfdp_checkDsk(): 22
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1350 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029970397
kfdp_checkDsk(): 23
----- Abridged Call Stack Trace -----
kfdp_checkDsk(): 24
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----

Script:收集ASM诊断信息

收集ASM诊断信息最佳的工具仍是RDA,对于不能使用RDA的环境可以采用如下脚本:

 

spool asm_diag1.txt
set pagesize 1000
set lines 500
col "Group Name"   form a25
col "Disk Name"    form a30
col "State"  form a15
col "Type"   form a7
col "Free GB"   form 9,999
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
select sysdate "Date and Time" from dual;

select * from v$asm_diskgroup order by 1;
select * from v$asm_disk order by 1, 2, 3;
select * from gv$asm_operation order by 1;
select * from v$version where banner like '%Database%' order by 1;
select * from gv$asm_client order by 1;

prompt

prompt ASM Disk Groups
prompt ===============

select group_number  "Group"
,      name          "Group Name"
,      state         "State"
,      type          "Type"
,      total_mb/1024 "Total GB"
,      free_mb/1024  "Free GB"
from   v$asm_diskgroup
/

prompt
prompt ASM Disks
prompt ==============

col "Group"          form 999
col "Disk"           form 999
col "Header"         form a9
col "Mode"           form a8
col "Redundancy"     form a10
col "Failure Group"  form a10
col "Path"           form a19

 select group_number  "Group"
,      disk_number   "Disk"
,      header_status "Header"
,      mode_status   "Mode"
,      state         "State"
,      redundancy    "Redundancy"
,      total_mb      "Total MB"
,      free_mb       "Free MB"
,      name          "Disk Name"
,      failgroup     "Failure Group"
,      path          "Path"
from   v$asm_disk
order by group_number
,        disk_number

/

prompt
prompt Instances currently accessing these diskgroups
prompt ==============================================

select c.group_number  "Group"
,      g.name          "Group Name"
,      c.instance_name "Instance"
from   v$asm_client c
,      v$asm_diskgroup g
where  g.group_number=c.group_number
/

prompt
prompt Report the Percentage of Imbalance in all Mounted Diskgroups
prompt ==============================================
select dfail, count(dfail) from
(
select disk, count(failgroup) as dfail
from x$kfdpartner, v$asm_disk where
number_kfdpartner=disk_number and grp=group_number
group by disk, failgroup
)
group by dfail; 

select g.name as "GROUP", d.name as "DISK", d.failgroup, fcnt, pcnt,
decode(pcnt - fcnt, 0, 'MUST', 'SHOULD') as action from
(select gnum, DISK1, failgroup, count(failgroup) as fcnt from
(select gnum, DISK1
from
(
select d.group_number as gnum, disk as disk1,
count(distinct failgroup) as dfail
from x$kfdpartner, v$asm_disk_stat d where
number_kfdpartner=disk_number and grp=d.group_number
and active_kfdpartner=1
group by d.group_number, disk
), v$asm_disk_stat
where dfail < 3
and disk1=disk_number
and gnum=group_number),
x$kfdpartner, v$asm_disk_stat d where
number_kfdpartner=disk_number and grp=d.group_number and grp=gnum
and disk1=disk
and active_kfdpartner=1
group by gnum, disk1, failgroup),
(select grp, disk, count(disk) as pcnt from x$kfdpartner where
active_kfdpartner=1 group by grp, disk),
v$asm_diskgroup_stat g, v$asm_disk_stat d
where gnum=grp and gnum=g.group_number and gnum=d.group_number and
disk=disk1 and disk=disk_number and
((fcnt = 1 and (pcnt - fcnt) > 3) or ((pcnt - fcnt) = 0))
/

col TYPE form a15
col FILE_NUMBER form 9999 head FILE_NUM
col GROUP_NUMBER form 9999 head GR_NUM
col GB for 9999.99

select GROUP_NUMBER   ,
 FILE_NUMBER          ,
 COMPOUND_INDEX       ,
 INCARNATION          ,
 BLOCK_SIZE           ,
 BLOCKS               ,
 BYTES/1024/1024/1024 GB ,
 TYPE                 ,
 STRIPED              ,
 CREATION_DATE        ,
 MODIFICATION_DATE
from v$asm_file
where TYPE != 'ARCHIVELOG'
/

prompt
prompt free ASM disks and their paths
prompt ===========================
select header_status , mode_status, path from V$asm_disk
where header_status in ('FORMER','CANDIDATE')
/

show parameter asm
show parameter size
show parameter proc
show parameter cluster
show parameter instance_type
show parameter instance_name

show parameter pfile

show sga

spool off

 

Code to be run on the ASM instance. Use file asmdebug.sql

 

set newpage none
set linesize 100
spool /tmp/asmdebug.out
--
-- Get a timestamp
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
--
-- Diskgroup information
set head off
select 'Diskgroup Information' from dual;
set head on
column name format a15
column DG# format 99
select group_number DG#, name, state, type, total_mb, free_mb from
v$asm_diskgroup;
--
-- Get the # of Allocation Units  per DG
set head off
select 'Number of AUs per diskgroup' from dual;
set head on
select count(number_kfdat) AU_count, group_kfdat DG# from x$kfdat
group by group_kfdat;
--
-- Get the # of Allocation Units  per DiskGroup and Disk
set head off
select 'Number of AUs per Diskgroup,Disk' from dual;
col "group#,disk#" for a30
set head on
select count(*)AU_count, GROUP_KFDAT||','||number_kfdat "group#,disk#" from x$kfdat group by GROUP_KFDAT,number_kfdat;
--
-- Get the # of allocated (V) and free (F) Allocation Units
set head off
select 'Number of allocated (V) and free (F) Allocation Units' from dual;
col "VF" for a2
set head on
select GROUP_KFDAT "group#", number_kfdat "disk#", v_kfdat "VF", count(*)
from x$kfdat
group by GROUP_KFDAT, number_kfdat, v_kfdat;
--
-- Get the # of Allocation Units per ASM file
set head off
select 'Number of AUs per ASM file ordered by AU count for metadata only'
from dual;
set head on
select count(XNUM_KFFXP) AU_count,  NUMBER_KFFXP file#, GROUP_KFFXP DG# from x$kffxp where NUMBER_KFFXP < 256
group by NUMBER_KFFXP, GROUP_KFFXP
order by count(XNUM_KFFXP) ;
--
-- Get the # of Allocation Units per ASM file by file alias.  Change the
-- system_created Y|N depending if you want the short or long ASM name
set head off
select 'Number of AUs per ASM file ordered by AU count.  This is for non
metadata' from dual;
col name format a60
set head on
select GROUP_KFFXP, NUMBER_KFFXP, name, count(*)
from x$kffxp, v$asm_alias
where GROUP_KFFXP=GROUP_NUMBER and NUMBER_KFFXP=FILE_NUMBER and
system_created='Y'
     group by GROUP_KFFXP, NUMBER_KFFXP, name
     order by GROUP_KFFXP, NUMBER_KFFXP;
--
-- Get partner information.  This is really only useful if redundancy is other than
-- external.
set head off
select 'The following shows the disk to partner relationship.  This is really only
useful if using normal or high redundancy.' from dual;
set head on
select grp DG#, disk, NUMBER_KFDPARTNER partner, PARITY_KFDPARTNER parity, ACTIVE_KFDPARTNER active
from x$kfdpartner;
--
-- Another look at file utilization.
set head off
set linesize 132
select 'bytes is the sum of AUs with data in them * 1024^2
space is the sum of all AUs allocated for this file * 1024^2'
from dual;
set head on
col Name format a60
select f.group_number, f.file_number, bytes, space, space/(1024*1024) "InMB", a.name "Name"
from v$asm_file f, v$asm_alias a
where f.group_number=a.group_number and f.file_number=a.file_number
    and system_created='Y'
    order by f.group_number, f.file_number;
--
-- Get robust disk information
set linesize 400
col failgroup format a20
col label format a20
col name format a40
col path format a40
set head off
select 'Robust disk information' from dual;
set head on
select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY, TOTAL_MB, FREE_MB,
NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS,
WRITES, READ_ERRS, WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN
from v$asm_disk;
--
spool off

 

 

Code to be executed on the database instances using this ASM instance. Use file rdbmsdebug.sql

 

set newpage none
spool /tmp/rdbmsdebug.out
--
-- Get a timestamp
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
--
-- Get datafile information as the database sees it
set head off
select 'V$DATAFILE information' from dual;
set head on
set linesize 132
col name format a60
select file#, name, block_size, blocks, bytes, bytes/(1024*1024) "InMB",
status from v$datafile;
--
-- Get controlfile information as the database sees it
set head off
select 'V$CONTROLFILE information' from dual;
set head on
select * from v$controlfile;
--
-- Get archivelog information as the database sees it
set head off
select 'GV$ARCHIVED_LOG information' from dual;
set head on
select name, thread#, sequence#, blocks*block_size "size", status
status from gv$archived_log
order by thread#,sequence#;
--
-- Get redolog information as the database sees it
set head off
select 'v$log and v$logfile information' from dual;
set head on
col member format a60
select a.group#, member, thread#, sequence#, bytes, a.status
from v$log a, v$logfile b where a.group# = b.group#
order by thread#;
--
-- Get tempfiles information as the database sees it
set head off
select 'GV$TEMPFILE information' from dual;
set head on
col name format a60
select INST_ID,TS#,FILE#, RFILE#,NAME, CREATION_CHANGE# , CREATION_TIME,  STATUS, BYTES, BLOCKS, CREATE_BYTES, BLOCK_SIZE
from gv$tempfile order by inst_id, file#;
spool off

 

Other items to collect:

System error logs
Alert logs from all database and ASM instances
All recent trace files from all databases involved and all of the ASM instances
All “.trc” files from the ASM instances

 

check ASM disk space

 

1) Determine which (if any) disks contain no free space (ie are below the threshold)
select group_kfdat "group #",
       number_kfdat "disk #",
       count(*) "# AU's"
from x$kfdat a
where v_kfdat = 'V'
and not exists (select *
                from x$kfdat b
                where a.group_kfdat = b.group_kfdat
                and a.number_kfdat = b.number_kfdat
                and b.v_kfdat = 'F')
group by GROUP_KFDAT, number_kfdat;

If no rows are returned ... the following query can also be used

select disk_number "Disk #", free_mb
from v$asm_disk
where group_number = *** disk group number ***
order by 2;

If rows are returned from the first query ... or FREE_MB is less than 100mb in the second ... then there is probably insufficient disk space to allow a rebalance to occur ... Note the Disk #'s for later

2) Determine which files have allocation units on the disk(s) that are on exhausted disks

select name, file_number
from v$asm_alias
where group_number in (select group_kffxp
                                           from x$kffxp
                                           where group_kffxp=*** disk group number ***
                                           and disk_kffxp in (*** disk list from #1 above ***)
                                           and au_kffxp != 4294967294
                                           and number_kffxp >= 256)
and file_number in (select number_kffxp
                                  from x$kffxp
                                  where group_kffxp=*** disk group number ***
                                  and disk_kffxp in (*** disk list from #1 above ***)
                                  and au_kffxp != 4294967294
                                  and number_kffxp >= 256)
and system_created='Y';

3) Free up space so that the rebalance can occur

Using the file list from #2 above ... we will need to either drop or move tablespace(s)/datafile(s) such that all disks that are exhausted have at least 100mb free ...

NOTE ... the AU count above ... should relate to 1mb AU size ... so if a single file ... with at least 100 au's can be dropped or moved ... this
should be sufficient to free up enough space to allow the rebalance to occur

Droppable tablespaces may be things like:
    * temporary tablespaces
    * index tablespaces (assuming you know how to rebuild the indexes)

If none of the tablespaces are droppable then the tablespace(s)/datafile(s) will need to be
    * moved to another diskgroup (at least temporarily) ...
    * dropped using RMAN (with the database shutdown) and will be restored later

   Note 330103.1  How to Move Asm Database Files From one Diskgroup To Another ?

4) Check to see if there is sufficient FREE_MB on the problem disks

select disk_number "Disk #", free_mb
from v$asm_disk
where disk_group = *** disk group number ***
and disk_number in (*** disk list from #1 above ***)
order by 2;

 

Check Diskgroup Balance

If you want to perform an imbalance check for all mounted diskgroups, run the script in MOS Note 367445.1.

If you want to determine if you already have imbalance for a file in an existing diskgroup, use the following query:

select disk_kffxp, sum(size_kffxp) from x$kffxp where group_kffxp=AAA and number_kffxp=BBB and lxn_kffxp=0 group by disk_kffxp order by 2;

Breakdown of input/output is as follows:

  • AAA is the group_number in v$asm_alias
  • BBB is file_number in v$asm_alias
  • disk_kffxp gives us the disk number.
  • size_kffxp is used such that we account for variable sized extents.
  • sum(size_kffxp) provides the number of AUs that are on that disk.
  • lxn_kffxp is used in the query such that we go after only the primary extents, not secondary extents

If you want to check balance from an IO perspective, query the statistics in v$asm_disk_iostat before and after running a large SQL statement. For example, if the running a large query that does just reads, the reads and read_bytes columns should be roughly the same for all disks in the diskgroup.

Oracle内部错误:ORA-00600[kfioTranslateIO03]一例

一套Linux x86-64上的11.2.0.2 RAC+ASM系统,其中一个节点出现了ORA-00600[kfioTranslateIO03]内部错误,其具体日志如下:

=============================alert.log===============================
adrci> show alert -tail -f 

2011-05-30 20:29:12.657000 +08:00
Starting background process RSMN
RSMN started with pid=31, OS id=22084
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /s01/orabase
ALTER DATABASE MOUNT /* db agent *//* {0:7:3} */
This instance was first to mount
2011-05-30 20:29:15.026000 +08:00
Sweep [inc][100831]: completed
Sweep [inc2][100831]: completed
NOTE: Loaded library: System
ORA-15025: could not open disk "/dev/raw/raw1"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw3"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw5"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
SUCCESS: diskgroup DATA was mounted
NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22056.trc  (incident=104831):

ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], [] 

Incident details in: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_104831/PROD1_ckpt_22056_i104831.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

adrci> show problem 

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************
PROBLEM_ID           PROBLEM_KEY                                                 LAST_INCIDENT
-------------------- ----------------------------------------------------------- --------------------
2                    ORA 7445 [kghdmp_new()+1133]                                18387
3                    ORA 7445 [kghfnd()+2672]                                    20701
5                    ORA 7445 [kcldmp()+246]                                     28229
6                    ORA 7445 [kclxle()+311]                                     28230
1                    ORA 4031                                                    56918
4                    ORA 445                                                     90278
7                    ORA 600 [kfioTranslateIO03]                                 108831

adrci> show incident -mode detail -p "incident_id=108831"

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
   INCIDENT_ID                   108831
   STATUS                        ready
   CREATE_TIME                   2011-05-30 20:31:55.484000 +08:00
   PROBLEM_ID                    7
   CLOSE_TIME
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                ORA
   ERROR_NUMBER                  600
   ERROR_ARG1                    kfioTranslateIO03
   ERROR_ARG2
   ERROR_ARG3
   ERROR_ARG4
   ERROR_ARG5
   ERROR_ARG6
   ERROR_ARG7
   ERROR_ARG8
   ERROR_ARG9
   ERROR_ARG10
   ERROR_ARG11
   ERROR_ARG12
   SIGNALLING_COMPONENT          ASM
   SIGNALLING_SUBCOMPONENT
   SUSPECT_COMPONENT
   SUSPECT_SUBCOMPONENT
   ECID
   IMPACTS                       0
   PROBLEM_KEY                   ORA 600 [kfioTranslateIO03]
   FIRST_INCIDENT                96831
   FIRSTINC_TIME                 2011-05-30 20:24:40.372000 +08:00
   LAST_INCIDENT                 108831
   LASTINC_TIME                  2011-05-30 20:31:55.484000 +08:00
   IMPACT1                       0
   IMPACT2                       0
   IMPACT3                       0
   IMPACT4                       0
   KEY_NAME                      ProcId
   KEY_VALUE                     19.1
   KEY_NAME                      Client ProcId
   KEY_VALUE                     oracle@rh2.oracle.com.22504_139763918456544
   KEY_NAME                      SID
   KEY_VALUE                     397.1
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
1 rows fetched

===================================trace===================================

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc

Dump continued from file: /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 108831 (ORA 600 [kfioTranslateIO03]) ========
----- Beginning of Customized Incident Dump(s) -----
kfioRqSet=0x7f1d524151c0 parent=0x7fffb2642d30 gn=(64.0) cnt=0
  size=32768 vxn=0 byte offset=16384 buf offset=0
  tried[0]=0 tried[1]=0 tried[2]=0 tried[3]=0 tried[4]=0 tried[5]=0
  skipped[0]=0 skipped[1]=0 skipped[2]=0 skipped[3]=0 skipped[4]=0 skipped[5]=0
parent :
DDE: Ending a split invocation on error recording!
----- End of Customized Incident Dump(s) -----

*** 2011-05-30 20:31:55.548
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkedDefDump()+2741  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexExplicitEndInc  call     dbgexPhaseII()       7F1D5281F710 ? 7F1D52822500 ?
()+750                                             7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndDDEInvocatio  call     dbgexExplicitEndInc  7F1D5281F710 ? 7F1D52822500 ?
nImpl()+767                   ()                   7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndSpltInvokOnR  call     dbgeEndDDEInvocatio  7F1D5281F710 ? 7F1D52822500 ?
ec()+265                      nImpl()              7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgePostErrorKGE()+  call     dbgeEndSpltInvokOnR  7F1D5281F710 ? 7F1D52822500 ?
248                           ec()                 7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F1D52830E40 ?
63                                                 000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgeade()+351         call     dbkePostKGE_kgsf()   00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgerelv()+135        call     kgeade()             00B7C8EA0 ? 00B7C9050 ?
                                                   7F1D52830E40 ? 000003AE9 ?
                                                   100000000 ? 000000002 ?
kserecl0()+157       call     kgerelv()            00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 00952980C ?
                                                   7FFFB2641C10 ? 000000000 ?
kfioErrorRecord()+7  call     kserecl0()           00B7C8EA0 ? 7F1D52830E40 ?
6                                                  000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfiorq_dump()+129    call     kfioErrorRecord()    7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioRqSetDump()+565  call     kfiorq_dump()        7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioTranslateIO()+3  call     kfioRqSetDump()      7F1D524151C0 ? 7F1D52830E40 ?
079                                                000003AE9 ? 000000005 ?
kfioRqSetPrepare()+  call     kfioTranslateIO()    7F1D524151C0 ? 7F1D52415098 ?
1017                                               7FFFB26421D4 ? 7FFFB26421D0 ?
                                                   0D4F338B0 ? 000000000 ?
kfioSubmitIO()+2852  call     kfioRqSetPrepare()   7F1D524151C0 ? 7F1D52415098 ?
                                                   7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequestPriv()+1  call     kfioSubmitIO()       7FFFB2642E10 ? 000000001 ?
94                                                 7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequest()+701    call     kfioRequestPriv()    000000000 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_kfioRequest()+  call     kfioRequest()        7FFFB2642E10 ? 000000001 ?
644                                                7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
ksfd_osmio()+1050    call     ksfd_kfioRequest()   7FFFB2642E10 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_io()+2717       call     ksfd_osmio()         000000001 ?
                                                   FFFFFFFFB2642D30 ?
                                                   FFFFFFFFB2642D30 ?
                                                   0D400A0B0 ? 000008000 ?
                                                   7FFFB2643170 ?
ksfdread()+576       call     ksfd_io()            0D400A0B0 ? 000000001 ?
                                                   7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify_file()  call     ksfdread()           0D400A0B0 ? 000000001 ?
+309                                               7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify()+225   call     kcc_identify_file()  0D400A0B0 ? 7F1D52417E00 ?
                                                   000000000 ? 060019450 ?
                                                   060019630 ? 0DAC34670 ?
kccida()+225         call     kcc_identify()       000000000 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbabs()+771         call     kccida()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbrdp()+971         call     ksbabs()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc

NOTE: disk 4 is missing from group 1
Incident 108831 created, dump file: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

=========Start of 'kfiorq = [0x7fffb2642d30]' dumping =========
        Status            =  UNKWOWN
        Flags             =  READ |  SYNC
        Mirror side       = 0
        Fib               = 0xd4f338b0
        Offset            = 1
        buffer ptr        = 0x7f1d52417e00
        Rcount            = 32768
        err_kfiorq        = 15081
        Inflight disk IO  = 0
        Completed disk IO = 0
        Oracle error      = 0
        Intended zone     = 48
  ===Dump of all attached kfiodrq's===
=========End of 'kfiorq = [0x7fffb2642d30]' dumping =========

parent :
############# kfiofib = 0xd4f338b0 #################
Diskgroup Name     =
File number        = 261.747100215
File type          = 1
Flags              = 10
Blksize            = 16384
File size          = 1131 blocks
Blk one offset     = 1
Redundancy         = 17
Physical blocksz   = 512
Open name          = +DATA/prod/controlfile/current.261.747100215
Fully-qualified nm =+DATA/prod/controlfile/current.261.747100215
Mapid             = 2
Slave ID          = -1
Connection        = 0x(nil)
############################################
Error ORA-600 signaled at ksedsts()+461<-ksf_short_stack()+77<-kge_snap_callstack()+63<-kge_sigtrace_dump()+69<-kgepop()+712<-kgersel()+175<-kfioTranslateIO()+3138<-kfi
oRqSetPrepare()+1022<-kfioSubmitIO()+2857<-kfioRequestPriv()+199<-kfioRequest()+706<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_iden
tify_file()+314<-kcc_identify()+230<-kccida()+230<-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 22504
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfioRequest()+2157<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_identify_file()+314<-kcc_identify()+230<-kccida()+230<
-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252
<-main()+201<-__libc_start_main()+244<-_start()+36 ----- End of Abridged Call Stack Trace ----- *** 2011-05-30 20:31:56.271 KSU: Terminating fatal process 'oracle@rh2.oracle.com (CKPT)' adrci> ips create package
Created package 2 without any contents, correlation level typical

adrci> ips add problem 7 package 2
Added problem 7 to package 2

adrci> ips finalize package 2
Finalized package 2

adrci> ips generate package 2 in /tmp
Generated package 2 in file /tmp/IPSPKG_20110531224208_COM_1.zip, mode complete

诊断发现由于ASM diskgroup磁盘组中的磁盘设备文件/dev/raw/raw*的权限被修改成了0600,而这些裸设备的拥有者为grid用户,导致oracle用户无法读写这些裸设备,通过将设备文件的权限修改为0660,解决了该问题。

Why ASMLIB and why not?

ASMLIB是一种基于Linux module,专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)。

长久以来我们对ASMLIB的认识并不全面,这里我们来具体了解一下使用ASMLIB的优缺点。

理论上我们可以从ASMLIB API中得到的以下益处:

  1. 总是使用direct,async IO
  2. 解决了永久性设备名的问题,即便在重启后设备名已经改变的情况下
  3. 解决了文件权限、拥有者的问题
  4. 减少了I/O期间从用户模式到内核模式的上下文切换,从而可能降低cpu使用率
  5. 减少了文件句柄的使用量
  6. ASMLIB API提供了传递如I/O优先级等元信息到存储设备的可能

虽然从理论上我们可以从ASMLIB中得到性能收益,但实践过程中这种优势是几乎可以忽略的,没有任何性能报告显示ASMLIB对比Linux上原生态的udev设备管理服务有任何性能上的优势。在Oracle官方论坛上有一篇<ASMLib and Linux block devices>讨论ASMLIB性能收益的帖子,你可以从中看到”asmlib wouldn’t necessarily give you much of an io performance benefit, it’s mainly for ease of management as it will find/discover the right devices for you, the io effect of asmlib is large the same as doing async io to raw devices.”的评论,实际上使用ASMLIB和直接使用裸设备(raw device)在性能上没有什么差别。

ASMLIB可能带来的缺点:

  1. 对于多路径设备(multipathing)需要在/etc/sysconfig/oracleasm-_dev_oracleasm配置文件中设置ORACLEASM_SCANORDER及ORACLEASM_SCANEXCLUDE,以便ASMLIB能找到正确的设备文件,具体可以参考Metalink Note<How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks? [ID 602952.1]>
  2. 因为ASM INSTANCE使用ASMLIB提供的asm disk,所以增加了额外的层面
  3. 每次Linux Kernel更新,都需要替换新的ASMLIB包
  4. 增加了因人为错误造成宕机downtime的可能
  5. 使用ASMLIB意味着要花费更多时间去创建和维护
  6. 因为ASMLIB的存在,可能引入更多的bug,这是我们最不想看到的
  7. 使用ASMLIB创建的disk,其disk header并不会和普通的asm disk header有什么不同,仅仅是在头部多出了ASMLIB的属性空间。

结论:
我个人的观点是尽可能不要使用ASMLIB,当然这不是DBA个人所能决定的事情。另一方面这取决于个人习惯,在rhel 4的早期发行版本中没有提供udev这样的设备管理服务,这导致在rhel 4中大量的ASM+RAC组合的系统使用ASMLIB , 经网友指出udev 作为kernel 2.6的新特性被引入,在rhel4的初始版本中就已经加入了udev绑定服务,但是在rhel4时代实际udev的使用并不广泛(In Linux 2.6, a new feature was introduced to simplify device management and hot plug capabilities. This feature is called udev and is a standard package in RHEL4 or Oracle
Enterprise Linux 4 (OEL4) as well as Novell’s SLES9 and SLES10.)。如果是在RHEL/OEL 5中那么你已经有充分的理由利用udev而放弃ASMLIB。

Reference:
ASMLIB Performance vs Udev
RAC+ASM 3 years in production Stories to share
How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks? [ID 602952.1]
ASMLib and Linux block devices

解决ORA-27103:internal error错误一例

一套Linux x86-64上的11.2.0.2数据库在startup启动阶段遭遇了ORA-27103:internal error内部错误,其出错日志如下:

SQL> startup nomount;
ORA-27103: internal error
Linux-x86_64 Error: 2: No such file or directory
Additional information: 9404423
Additional information: 2

oerr 27103
Usage: oerr facility error

Facility is identified by the prefix string in the error message.
For example, if you get ORA-7300, "ora" is the facility and "7300"
is the error.  So you should type "oerr ora 7300".

If you get LCD-111, type "oerr lcd 111", and so on.

================= alert.log ====================
This instance was first to mount
2011-05-02 21:49:47.009000 +08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_asmb_14386.trc:
ORA-04031: unable to allocate 393240 bytes of shared memory 
("large pool","unknown object","large pool","ASM map operations hashtable")
ASMB (ospid: 14386): terminating the instance due to error 4031

System state dump requested by (instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination].

System State dumped to trace file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_diag_14346.trc

Dumping diagnostic data in directory=[cdmp_20110502214947], requested by 
(instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination].

Instance terminated by ASMB, pid = 14386

=============================system state dump============================

PROCESS 24: ASMB
  ----------------------------------------
  SO: 0x92c955c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x92c955c8, name=process, file=ksu.h LINE:12451, pg=0
  (process) Oracle pid:24, ser:1, calls cur/top: 0x9288c778/0x9288c778
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x0)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 2296 0 2
              last post received-location: ksl2.h LINE:2293 ID:kslpsr
              last process to post me: 92c8e248 1 6
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:282 ID:ksasnd
              last process posted by me: 92c8e248 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x92d24ae0
    O/S info: user: oracle, term: UNKNOWN, ospid: 14386
    OSD pid info: Unix process pid: 14386, image: oracle@rh2.oracle.com (ASMB)
    ----------------------------------------
    SO: 0x92e80a58, type: 4, owner: 0x92c955c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x92c955c8, name=session, file=ksu.h LINE:12459, pg=0
    (session) sid: 13 ser: 1 trans: (nil), creator: 0x92c955c8
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x408) -/-
              DID: , short-term DID:
              txn branch: (nil)
              oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
      Not in wait; last wait ended 1.501399 sec ago
    Wait State:
      fixed_waits=0 flags=0x21 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 1.501469 sec since last wait
     0: waited for 'SGA: allocation forcing component growth'
        =0x0, =0x0, =0x0
        wait_id=14 seq_num=27 snap_id=7
        wait times: snap=0.000000 sec, exc=0.305374 sec, total=0.305390 sec
        wait times: max=infinite
        wait counts: calls=6 os=6
        occurred after 0.000000 sec of elapsed time
     1: waited for 'SGA: allocation forcing component growth'
        =0x0, =0x0, =0x0
        wait_id=20 seq_num=26 snap_id=1
        wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
        wait times: max=infinite
        wait counts: calls=1 os=1
        occurred after 0.000000 sec of elapsed time
     2: waited for 'SGA: allocation forcing component growth'
        =0x0, =0x0, =0x0
        wait_id=14 seq_num=25 snap_id=6

在没有阅读告警日志前我的第一反应可能是上次shutdown时Oracle进程没有被清理干净,导致shared memory segments一直没有释放,从而造成了以上ORA-27103错误。

不过其实这个问题告警日志里有明确的信息,即RDBMS Instance数据库实例在mount阶段asmb进程(负责db instance与asm instance的交互)试图从large pool大池中分配390k的空间,但遭遇了ORA-04031错误,如果asmb后台进程无法正常工作将直接导致db实例无法找到asm存储上的必要Extent,因此导致出出现了”ORA-27103: internal error:Linux-x86_64 Error: 2: No such file or directory”。

换而言之ORA-04031错误才是罪魁祸首,我们来是看该实例初始化的内存参数:

[oracle@rh2 dbs]$ strings spfilePROD1.ora |egrep "sga|memory|pool"
PROD1.__large_pool_size=16777216
*.memory_target=943718400
*.shared_pool_size=314572800
*.streams_pool_size=0

因为是11g的实例所以采用了automatic memory management特性管理直接设置了memory_target参数为900M,并设置了1号实例的large pool最小为16M,900M的大小对于10g的实例而言仍是绰绰有余的,但是显然在11gr2中设置memory_target为900M是不足以驱动这样一个”庞然大物”的。我们需要配置更多的内存,亦或者可以通过设置更大的large pool来解决令人郁闷的ORA-04031错误:

[oracle@rh2 dbs]$ strings spfilePROD1.ora > initPROD1.ora
[oracle@rh2 dbs]$ rm spfilePROD1.ora 
[oracle@rh2 dbs]$ vi initPROD1.ora

/* 修改memory_target为至少912M */

*.memory_target=1200M

/* 成功启动!  */
SQL> startup ;
ORACLE instance started.

Total System Global Area 1252663296 bytes
Fixed Size                  2226072 bytes
Variable Size             687868008 bytes
Database Buffers          553648128 bytes
Redo Buffers                8921088 bytes
Database mounted.
Database opened.

Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.236.169, mac=94-0c-6d-71-8c-c2, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
  [name='eth0', type=1, ip=192.168.1.121, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
  [name='eth0:1', type=1, ip=192.168.1.133, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:2' configured from GPnP for use as a public interface.
  [name='eth0:2', type=1, ip=192.168.1.122, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Picked latch-free SCN scheme 3
2011-05-02 22:28:04.408000 +08:00
WARNING: db_recovery_file_dest is same as db_create_file_dest
Autotune of undo retention is turned on. 
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /s01/oracle/product/11.2.0/dbhome_1/dbs/initPROD1.ora
System parameters with non-default values:
  processes                = 150
  shared_pool_size         = 304M
  streams_pool_size        = 0
  memory_target            = 1200M
  control_files            = "+DATA/prod/controlfile/current.261.747100215"
  control_files            = "+DATA/prod/controlfile/current.260.747100215"
  db_block_size            = 8192
  db_flash_cache_file      = "/flashcard/prod1cache.dsk"
  db_flash_cache_size      = 20G
  compatible               = "11.2.0.0.0"
  log_archive_dest_1       = "location=+DATA"
  cluster_database         = TRUE
  db_create_file_dest      = "+DATA"
  db_recovery_file_dest    = "+DATA"
  db_recovery_file_dest_size= 40320M
  thread                   = 1
  undo_tablespace          = "UNDOTBS1"
  instance_number          = 1
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=PRODXDB)"
  remote_listener          = "rh-cluster-scan:1521"
  remote_listener          = "*.remote_login_pas"
  audit_file_dest          = "/s01/orabase/admin/PROD/adump"
  audit_trail              = "DB"
  db_name                  = "PROD"
  open_cursors             = 300
  diagnostic_dest          = "/s01/orabase"
Cluster communication is configured to use the following interface(s) for this instance
  169.254.236.169
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
2011-05-02 22:28:07.675000 +08:00
ORA-00132: syntax error or unresolved network name '*.remote_login_pas'
PMON started with pid=2, OS id=19807 
PSP0 started with pid=3, OS id=19809 
2011-05-02 22:28:08.754000 +08:00
VKTM started with pid=4, OS id=19811 at elevated priority
GEN0 started with pid=5, OS id=19815 
VKTM running at (1)millisec precision with DBRM quantum (100)ms
DIAG started with pid=6, OS id=19817 
DBRM started with pid=7, OS id=19819 
PING started with pid=8, OS id=19821 
ACMS started with pid=9, OS id=19823 
DIA0 started with pid=10, OS id=19825 
LMON started with pid=11, OS id=19827 
LMD0 started with pid=12, OS id=19829 
LMS0 started with pid=13, OS id=19831 at elevated priority
RMS0 started with pid=14, OS id=19835 
LMHB started with pid=15, OS id=19837 
MMAN started with pid=16, OS id=19839 
* Load Monitor used for high load check 
* New Low - High Load Threshold Range = [1920 - 2560] 
LGWR started with pid=18, OS id=19843 
DBW0 started with pid=17, OS id=19841 
CKPT started with pid=19, OS id=19845 
SMON started with pid=20, OS id=19847 
RECO started with pid=21, OS id=19849 
RBAL started with pid=22, OS id=19851 
ASMB started with pid=23, OS id=19853 
MMON started with pid=24, OS id=19855 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=25, OS id=19857 
starting up 1 shared server(s) ...
lmon registered with NM - instance number 1 (internal mem no 0)
2011-05-02 22:28:09.825000 +08:00
NOTE: initiating MARK startup 
Starting background process MARK
MARK started with pid=28, OS id=19866 
NOTE: MARK has subscribed 
Reconfiguration started (old inc 0, new inc 2)
List of instances:
 1 (myinst: 1) 
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=30, OS id=19872 
Starting background process RSMN
RSMN started with pid=31, OS id=19874 
ORACLE_BASE from environment = /s01/orabase
2011-05-02 22:28:12.112000 +08:00
ALTER DATABASE   MOUNT
This instance was first to mount
2011-05-02 22:28:13.202000 +08:00
NOTE: Loaded library: System 
ALTER SYSTEM SET 
local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.122)(PORT=1521))))' 
SCOPE=MEMORY SID='PROD1';
SUCCESS: diskgroup DATA was mounted
NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established


/* 也可以直接增大large_pool_size来解决上述问题 */

large_pool_size=30M
memory_target=912M

Comparation between ASM note [ID 373242.1] and note [ID 452924.1]

Question:
Oracle Support on the note “Lun Size And Performance Impact With Asm [ID 373242.1]” and on the note “How to Prepare Storage for ASM [ID 452924.1]” say that in other things:

– Maximize the number of disks in a disk group for maximum data distribution and higher I/O bandwidth
– Size alone should not affect the performance of a LUN. The underlying hardware is what counts.
There is no magic number for the LUN size
– For larger LUNs it is recommended using a large ALLOCATION UNIT.

The provieder of the Storage EMC-Vmax say that the LUNS size can be 200 GB for database OLTP.

We believe that the premise or EMC_Vmax contradicts to ORACLE so instead of maximize the disks they pretend minimize the disks
of the DISK GROUP.

You believe that this is to be true for Storage EMC-Vmax.

Answer:
As mentioned in the document 373242.1.LUN size alone should not affect the performance.

Size alone should not affect the performance of a LUN. The underlying hardware is what counts. There is no magic number for the LUN size. Seek the advice of the storage vendor to recommend the best configuration from a raid 1 or 5 perspective for performance and availability since this may vary between vendors. Given a database size and storage hardware you have available, our best practice recommends creating larger LUNs (if possible to reduce LUN management for the sys admins) and create LUNs from separate set of storage arrays (drives) so that LUNs are not sharing drives if possible.

For larger LUNs it is recommended using a large ALLOCATION UNIT.

Discover Your Missed ASM Disks

经常有网友在构建10g/11g中ASM存储环境的时候遇到ASM磁盘无法识别的问题,虽然已经为存储设备赋予了适当的权限,也为ASM实例修改了asm_diskstring初始化参数,可是在DBCA的ASM Diskgroup创建页面里就是无法显示候选的ASM Disk磁盘。

实际上因为ASM存储方式比起裸设备或GPFS来说更为黑盒,我们也无法利用ASM instance中的一些动态性能视图或内部视图来排查造成这一问题的原因,使得这类问题显得十分棘手。

下面我来介绍一种使用操作系统调用追踪工具来排查ASM无法找到磁盘问题的方法。

[oracle@vrh1 raw]$ cd /dev/rdsk 

/* 演示中我们要用到的三个裸设备位于/dev/rdsk下 */

[oracle@vrh1 rdsk]$ ls
hdisk1  hdisk2  hdisk3

[oracle@vrh1 rdsk]$ sqlplus / as sysdba

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>  select path from v$asm_disk;
no rows selected

SQL> alter system set asm_diskstring='/dev/rdsk/hdisk*';
System altered.

SQL> select * from v$asm_disk;
no rows selected

/* 以上虽然设置了asm_diskstring参数,ASM instance依然无法在指定路径下找到合适的设备
    这是为什么呢?!
*/

[oracle@vrh1 rdsk]$ ps -ef|grep rbal|grep -v grep
oracle   31375     1  0 15:40 ?        00:00:00 asm_rbal_+ASM1

/* 找出当前ASM实例中的rbal后台进程  */

/* 针对该后台进程做系统调用的trace,一般Linux上使用strace,而Unix上使用truss */

strace -f -o /tmp/asm_rbal.trc -p $OS_PID_OF_RBAL_BGPROCESS
truss  -ef -o /tmp/asm_rbal.trc -p $OS_PID_OF_RBAL_BGPROCESS

[oracle@vrh1 rdsk]$ strace -f -o /tmp/asm_rbal.trc  -p 31375
Process 31375 attached - interrupt to quit

/* 在另外一个终端窗口中打开sqlplus登录ASM实例,执行对v$ASM_DISK或者X$KFDSK的查询 */

SQL> select * from x$kfdsk;

no rows selected

/* 完成以上查询后使用Ctrl+C中断strace命令,并分析生成的system call trace */

/* 因为在不同Unix平台上rbal后台进程可能使用不同的system call function函数以达到相同的目的,
    所以在你的平台上可能并不像在Linux上使用open和access2个关键词搜索就可以得到答案了!     */

[oracle@vrh1 rdsk]$ cat /tmp/asm_rbal.trc |egrep "open|access"
31375 open("/dev/rdsk", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 15
31375 access("/dev/rdsk/hdisk3", R_OK|W_OK) = -1 EACCES (Permission denied)
31375 access("/dev/rdsk/hdisk2", R_OK|W_OK) = -1 EACCES (Permission denied)
31375 access("/dev/rdsk/hdisk1", R_OK|W_OK) = -1 EACCES (Permission denied)

可以看到以上系统调用中使用access函数访问候选的hdisk*ASM磁盘设备时出现了”Permission denied”的问题,很显然是因为权限不足导致了ASM实例无法利用这部分的磁盘设备,我们来纠正这一问题后再次尝试:

[root@vrh1 ~]# chown oracle:dba /dev/rdsk/hdisk*

SQL> select path_kfdsk from x$kfdsk;

PATH_KFDSK
--------------------------------------------------------------------------------
/dev/rdsk/hdisk3
/dev/rdsk/hdisk2
/dev/rdsk/hdisk1

SQL> select path from v$asm_disk;

PATH
--------------------------------------------------------------------------------
/dev/rdsk/hdisk3
/dev/rdsk/hdisk1
/dev/rdsk/hdisk2

以上丢失ASM Disk诊断方法的原理是在查询v$asm_disk或者x$kfdsk视图时Oracle会让RBAL这个ASM特别的后台进程去访问asm_diskstring参数指定路径下的所有可用设备,只要我们了解了在访问过程中RBAL遭遇的问题,那么一般来说都可以很简单地予以解决,如果strace/truss也无法给予你任何启示的话,也许你不得不去提交一个SR让Oracle Support来进一步协助你了,当然在正式提交SR之前你有必要再次确认一下你的ASM Disk是否都满足了以下的这些硬指标:

1.合理地设置ASM_DISKSTRING参数,若没有设置ASM_DISKSTRING参数那么ASM实例会尝试到默认的一些路径去搜索磁盘设备,这些默认路径在不同操作系统上略有不同:

操作系统 默认搜索路径
Solaris /dev/rdsk/*
Windows \\.\orcldisk*
Linux /dev/raw/*
Linux with ASMLIB ORCL:*
Linux with ASMLIB /dev/oracleasm/disks/*
HPUX /dev/rdsk/*
HP-UX(Tru 64) /dev/rdsk/*
AIX /dev/*

2.候选磁盘设备应当属于安装ASM的Oracle软件的用户,否则使用chown或卷管理软件修改磁盘拥有者,并保证磁盘被正确加载

3.候选磁盘设备应当设置合理的权限,一般为660,如果存在问题那么可以临时设为770以便测试

4.RAC环境中需要注意所有的磁盘设备应当在所有节点都可见,建议使用cluvfy工具验证

几个关于oracle 11g ASM的问题

Question:

1.11g Oracle Clusterware需要的OCR和Voting disk可以存储在ASM或者集群文件系统或者NFS中。对于全新安装,裸设备不再被支持(是否有办法使用裸设备?)。

2.使用ASM时,若相关存储上的磁盘路径(disk path)名前后不一致,是否仍然可以使用?需要什么调整?

Answer:

1.在11gr2 Grid Infrastructure全新安装时是没有办法使用裸设备的(You cannot install OCR or voting disk files on raw partitions. You can install only on Oracle ASM, or on supported network-attached storage or cluster file systems. The only use for raw devices is as Oracle ASM disks.);但可以通过后续的手段将OCR和VOTING DISK移动到裸设备上,如:

替换OCR:
ocrconfig -add rawdevice
ocrconfig -replace

替换voting disk
crsctl add votedisk css  -force
crsctl delete votedisk css  -force

具体可以参考Metalink文档<How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk>
实际上强烈不建议这样做。因为如果出现问题,Oracle GCS可以拒绝提供建议。

2.ASM是通过读取磁盘头部来了解磁盘内容的;磁盘路径名在安装时需要在所有节点一致,在安装完成后即便路径名改变也不会影响到ASM的使用。
需要注意的是在AIX操作平台上分配给ASM的磁盘(ASM DISK),如果直接是HDISK形式的LUN则该HDISK不应当具有PVID(If the disk device has a PVID, then ASM will fail to mount the diskgroup created on the disk device.)。如果是裸的逻辑卷,那么所建VG应当是scaleable volume group。

解决Oracle错误ORA-15061一例

一套Linux上的11.2.0.1系统,告警日志中出现以下错误:

ORA-00202: control file: '+DATA/controlfile/current.256.7446483424'
ORA-17505: ksfdrsz:1 Failed to resize file to size 612 blocks
ORA-15061: ASM operation not supported [41]
WARNING: Oracle Managed File +FRA in the recovery area is orphaned by the control file.
The control file can not keep all recovery area files due to space limitation.
krse.c
Archived Log entry 200 added for thread 1 sequence 200 ID 0x3739c2f0 dest 1:

RMAN backup failed with

Rman backup error:
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 
ORA-19510: failed to set size of 6800 blocks for file "+FRA" (block size=8192)
ORA-17505: ksfdrsz:1 Failed to resize file to size 6800 blocks
ORA-15061: ASM operation not supported [41]

[oracle@rh2 ~]$ oerr ora 15061
15061, 00000, "ASM operation not supported [%s]"
// *Cause:  An ASM operation was attempted that is invalid or not supported
//          by this version of the ASM instance.
// *Action: This is an internal error code that is used for maintaining
//          compatibility between software versions and should never be
//          visible to the user; contact Oracle support Services.
//

提交SR后,根据Oracle GCS确认为BUG:9788316:

1.The following error indicates that it failed to resize the controlfile to 612 blocks. If the DB_BLOCK_SIZE is 8192, 
then 612 blocks is not more than 5MB. According to the results of the query on V$ASM_DISKGROUP in 'results01.txt' file, 
the ASM diskgroup +DATA has 108605 MB free space. So, the ASM diskgroup +DATA has enough space for the 612 blocks.

2. By the way, please confirm whether you have recently applied PSU #1. Anyway, 
please try to relink the Oracle executables, as shown here. Before you run the "relink" command, 
make sure to shutdown both the ASM instance and target database.

$ORACLE_HOME/bin/relink all

3 After relinking the Oracle executables, please confirm whether you are still 
experiencing the same ORA-15061 error.

ORA-15061: ASM Operation Not Supported [41] After Apply PSU #1 (Doc ID 1126113.1)
ORA-15061 reported while doing a file operation with 11.1 or 11.2 ASM after PSU applied in database home (Doc ID 1070880.1)

Hdr: 9788316 11.2.0.1.1 RDBMS 11.2.0.1.0 ASM PRODID-5 PORTID-267
Abstract: AFTER APPLY PSU 1 (11.2.0.1.1) ON RDBMS HOME UNABLE TO RESIZE ASM DATAFILE.

*** 06/07/10 02:09 pm ***
—-
3-1827355081

PROBLEM:
——–
1) After apply PSU 1 (11.1.0.2.1) on the 11.2.0.1.0 RDBMS Oracle Home
customer is unable to resize an ASM datafile:
============================================================
SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

============================================================

2) DB alertlog reports (alert_brg13ed1.log):
============================================================

Mon Jun 07 11:27:57 2010
Stopping background process CJQ0
Mon Jun 07 12:30:28 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 536870915
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 536870915…
Mon Jun 07 13:07:40 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560m
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560m…
Mon Jun 07 13:49:50 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 800M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 800M…
Mon Jun 07 13:58:51 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M…
Mon Jun 07 14:25:43 2010

============================================================

3) ASM alert.log does not report any problem.

DIAGNOSTIC ANALYSIS:
——————–
(see below)

WORKAROUND:
———–
None

RELATED BUGS:
————-

REPRODUCIBILITY:
—————-

TEST CASE:
———-

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

*** 06/07/10 02:09 pm ***
4) I can confirm that at (Jun 07 10:55:59 EDT 2010) customer installed the
Patch: 9355126 (on the Grid OH), which includes the fix for Bug: 8898852:
============================================================

Invoking OPatch 11.2.0.1.2

Oracle Interim Patch Installer version 11.2.0.1.2

Oracle Home : /u01/grid/oracle/product/grid
Central Inventory : /u01/app/oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 11.2.0.1.2
OUI version : 11.2.0.1.0
OUI location : /u01/grid/oracle/product/grid/oui
——————————————————————————

Installed Top-level Products (1):

Oracle Grid Infrastructure
11.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (87):

There are 87 products installed in this Oracle Home.

Interim patches (1) :

Patch 9355126 : applied on Mon Jun 07 10:55:59 EDT 2010
Unique Patch ID: 12175902
Created on 5 Feb 2010, 07:38:30 hrs PST8PDT
Bugs fixed:
8974548, 8898852
============================================================

5) Also, I can confirm that at (Mon May 03 00:01:14 EDT 2010) customer
installed the 11.2.0.1.1 Patch Set Update (PSU #1) which include as well the
patch for Bug: 8898852 (on the RDBMS OH):
============================================================
Invoking OPatch 11.2.0.1.2

Oracle Interim Patch Installer version 11.2.0.1.2

Oracle Home : /u01/app/oracle/product/11.2.0/db1
Central Inventory : /u01/app/oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 11.2.0.1.2
OUI version : 11.2.0.1.0
OUI location : /u01/app/oracle/product/11.2.0/db1//oui

——————————————————————————

Installed Top-level Products (1):

Oracle Database 11g
11.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (134):

. 11.2.0.1.0
There are 134 products installed in this Oracle Home.

Interim patches (1) :

Patch 9352237 : applied on Mon May 03 00:01:14 EDT 2010
Unique Patch ID: 12366369
Created on 6 Apr 2010, 05:03:41 hrs PST8PDT
Bugs fixed:
8661168, 8769239, 8898852, 8801119, 9054253, 8706590, 8725286, 8974548
8778277, 8780372, 8769569, 9027691, 9454036, 9454037, 9454038, 8761974
7705591, 8496830, 8702892, 8639114, 8723477, 8729793, 8919682, 8818983
9001453, 8475069, 9328668, 8891929, 8798317, 8820324, 8733749, 8702535
8565708, 9036013, 8735201, 8684517, 8870559, 8773383, 8933870, 8812705
8405205, 8822365, 8813366, 8761260, 8790767, 8795418, 8913269, 8897784
8760714, 8717461, 8671349, 8775569, 8898589, 8861700, 8607693, 8642202
8780281, 9369797, 8780711, 8784929, 8834636, 9015983, 8891037, 8828328
8570322, 8832205, 8665189, 8717031, 8685253, 8718952, 8799099, 8633358
9032717, 9321701, 8588519, 8783738, 8796511, 8782971, 8756598, 9454385
8856497, 8703064, 9066116, 9007102, 8721315, 8818175, 8674263, 9352237
8753903, 8720447, 9057443, 8790561, 8733225, 9197917, 8928276, 8991997,
8837736
============================================================

6) But the problem persists:
============================================================
SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

============================================================

7) 11.2.0.1.1 Patch Set Update (PSU #1) should include the fix for bug:
8898852 (ORA-15061: ASM operation not supported [41]).
This PSU contains a fix that adds an operation to the ASM protocol.
The problem is that the PSU has been applied correctly to the RDBMS
home, but the fix is not correctly applied to the ASM HOME.

So either install that patch to the ASM HOME, or if that has been done
check why it failed.

A common reason for failure is a permission problem. Check bug 9711074
and its duplicates for more background. And check documentation bug 8629483
for the solution.

How do I know the fix is not missing? This error message is the key:
ORA-15061: ASM operation not supported [41]
(especially with operation 41)

If the patch was applied to the grid home, then probably the relink failed
due to permission problems.

ORA-15061: ASM Operation Not Supported [41] After Apply PSU #1 [ID 1126113.1]

pplies to:
Oracle Server – Enterprise Edition – Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
1) After apply PSU 1 (11.2.0.1.1 ) on the Grid Infrastructure Oracle Home
customer is unable to resize an ASM datafile:

SQL> alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
*
ERROR at line 1:
ORA-1237: cannot extend datafile 4
ORA-1110: data file 4: ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
ORA-17505: ksfdrsz:1 Failed to resize file to size 71680 blocks
ORA-15061: ASM operation not supported [41]

2) DB alertlog reports (alert_brg13ed1.log):

Mon Jun 07 11:27:57 2010
Stopping background process CJQ0
Mon Jun 07 12:30:28 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 536870915
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 536870915…
Mon Jun 07 13:07:40 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560m
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560m…
Mon Jun 07 13:49:50 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 800M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 800M…
Mon Jun 07 13:58:51 2010
alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’
resize 560M
ORA-1237 signalled during: alter database datafile
‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M…
Mon Jun 07 14:25:43 2010

3) ASM alert.log does not report any problem.

Changes
PSU 1 (11.2.0.1.1 ) patchset was installed on the Grid Infrastructure Oracle Home
Cause

The PSU #1 (11.2.0.1.1 ) patch on the Grid Infrastructure Oracle Home was not correctly linked by opatch.

The PSU #1 (11.2.0.1.1 ) patch on the Grid Infrastructure Oracle Home needs to be relinked.
Solution
Relink the PSU #1 patch on the Grid Infrastructure Oracle Home as follow:

1) Shutdown the database instances.

2) Shutdown the ASM instance.

3) Relink the Grid Infrastructure Oracle Home as follow:

script /tmp/relink_GI.txt

env | sort

$ORACLE_HOME/bin/relink all

exit

4) Startup the ASM instance.

5) Startup the databases.

6) Resize the datafile:

SQL> alter database datafile ‘+DATA/brg13ed/datafile/ts_brg_users.1279.711064345’ resize 560M;

该bug一般是由于对grid infrastructure实施了PSU 1 (11.2.0.1.1 )后,Oracle binary没有正确link导致的,可行的解决方案是relink all,但这需要重启instance!

Fixed X$ Tables in ASM

From Vinod Haval‘s <Inside Overview of ASM Metadata>
These Views helps in understanding the following metrics

  • Physical Mapping
  • Provides Undocumented Information
  • 18 X$ Tables (May be more)
TABLE NAME DESCRIPTION
X$KFALS This table gives the details about aliases 

created in ASM

X$KFCBH This is similar to x$kfbh and have same number 

of rows as x$kfbh

X$KFCCE This table helps to locate the particular block
X$KFBH This table gives more physical block level info
X$KFDSK_STAT This table provides the usage metrics data which can be used for performance analysis
X$KFGRP This table provides the disk groups info in ASM
X$KFGRP_STAT This table gives the usage metrics data for all the disk groups within the ASM
X$KFGMG This table provides the details about ASM operations
TABLE NAME DESCRIPTION
X$KFKID This table provides the info about ASM disks
X$KFNCL This is similar to x$kfbh and have same number 

of rows as x$kfbh

X$KFTMTA This table provides the info about DB instance 

connected to ASM instance

X$KFFIL This table gives more physical block level info
X$KFFXP This table provides the physical extent allocation mapping info within ASM files
X$KFDAT
X$KFDPARTNER
X$KFCLLE

沪ICP备14014813号-2

沪公网安备 31010802001379号