【转】How to map an ASMLIB disk to a device name

World mapWhen using ASMLIB to manage ASM disks, the device path info is not displayed in gv$asm_disk.path.

If you are using ASMLIB Support Tools 2.1. and later (package oracleasm-support-2.1*) you can get that info by running ‘oracleasm querydisk -p’ as root:

# ls -l /dev/oracleasm/disks
total 0
brw-rw—- 1 grid asmadmin 8,  5 May  2 12:00 DISK1
brw-rw—- 1 grid asmadmin 8,  6 May  2 12:00 DISK2
brw-rw—- 1 grid asmadmin 8,  7 May  2 12:00 DISK3

# oracleasm querydisk -p DISK1
Disk “DISK1″ is a valid ASM disk
/dev/sda5: LABEL=”DISK1″ TYPE=”oracleasm”

Otherwise, that info can be obtained with a shell script like this:

#!/bin/bash
for asmlibdisk in `ls /dev/oracleasm/disks/*`
do
echo “ASMLIB disk name: $asmlibdisk”
asmdisk=`kfed read $asmlibdisk | grep dskname | tr -s ‘ ‘| cut -f2 -d’ ‘`
echo “ASM disk name: $asmdisk”
majorminor=`ls -l $asmlibdisk | tr -s ‘ ‘ | cut -f5,6 -d’ ‘`
device=`ls -l /dev | tr -s ‘ ‘ | grep “$majorminor” | cut -f10 -d’ ‘`
echo “Device path: /dev/$device”
done

The script can be run as OS user that owns ASM or Grid Infrastructure home, i.e. it does not need to be run as privileged user. The only requirement it that kfed binary exists and that it is in the PATH.

If an ASMLIB disk was alrady deleted, it will not show up in /dev/oracleasm/disks. I can check for devices that are (or were) associated with ASM with a script like this:

#!/bin/bash
for device in `ls /dev/sd*`
do
asmdisk=`kfed read $device|grep ORCL|tr -s ‘ ‘|cut -f2 -d’ ‘|cut -c1-4`
if [ “$asmdisk” = “ORCL” ]
then
echo “Disk device $device may be an ASM disk”
fi
done

This scripts takes a peek at sd devices in /dev, so in addition to kfed in the PATH, it needs to be run as privileged user. Of course you can look at /dev/dm*, /dev/mapper, etc or all devices in /dev, although that may not be a good idea.

There was recently a question on how to achieve the above without kfed. Here is one way to do it:

#!/bin/bash
for device in `ls /dev/sd*`
do
asmdisk=`od -c $device | head | grep 0000040 | tr -d ‘ ‘ | cut -c8-11`
if [ “$asmdisk” = “ORCL” ]
then
echo “Disk device $device may be an ASM disk”
fi
done

 

【转】ASM metadata

ASM metadata smallAn ASM instance manages metadata needed to make ASM files available to Oracle databases and ASM clients. ASM metadata is stored in disk groups – in metadata blocks.

Some ASM metadata is at the fixed position in every ASM disk, and is referred to as physically addressed metadata. Other ASM metadata is organised in files (directories) and is referred to as virtually addressed metadata. The virtually addressed metadata is managed like all other ASM files – they get mirrored as per the file type redundancy policy, are subject to rebalance and can grow as needed.

Each ASM disk has ASM metadata, with some of this metadata relevant to that disk only and some relevant to the whole disk group. For example, the ASM disk header is relevant to that disk only, while  the Partnership and Status Table (PST) is relevant to the whole disk group.

Physically addressed metadata

The physical ASM metadata are the following structures:

Allocation units 0 on every ASM disk will always have the disk header (block 0), the Free Space Table (block 1) and the Allocation Table – in the rest of the allocation unit 0 blocks.
Allocation unit 1 (AU1) on every ASM disk is always reserved for the Partnership and Status Table. While AU1 on every disk will be reserved – only some disks will have the actual PST data.

Virtually addressed metadata

The virtually addressed metadata are the following structures:

ASM metadata lives in ASM disk groups

ASM metadata is stored in disk groups – in other words if there are no disk groups there is no ASM metadata.  This sounds obvious, but the point is that ASM does not store anything outside of its disk groups.

Each ASM disk has ASM metadata. Some of this metadata is relevant to that disk only and some is relevant to the whole disk group. For example, the ASM disk header is relevant to that disk only, but the partnership and status table (PST) is relevant to the whole disk group.

Some metadata will be on every disk – e.g. a disk header and an allocation table. Other metadata will be on a subset of disks – e.g. allocation unit 1 on every ASM disk will be reserved for the PST, but only a subset of disks will actually have the PST data.

Some metadata will not be present at all – e.g. in a 10.2 disk group there will be no staleness directory, as that feature is only relevant to 11.1 and later version.  And even in 11.1 – an external redundancy disk group will not have the staleness directory as that feature is relevant to a normal and high redundancy disk groups only.

ASM metadata blocks

ASM metadata is organized in ASM metadata blocks. For a complete discussion on this topic please see the ASM metadata blocks post.

ASM metadata structures consist of one or more ASM metadata blocks – where the block type will match the ASM metadata type.  For example an ASM disk header will consist of exactly one metadata block of type KFBTYP_DISKHEAD; an allocation table will consist of a number of metadata blocks, all of type KFBTYP_ALLOCTBL, etc.

【转】kfed – ASM metadata editor

accesories-text-editor-smThe kfed is an undocumented ASM utility that can be used to read and modify ASM metadata blocks. It is a standalone utility, independent of ASM instance, so it can be used with either mounted or dismounted disk groups. The most powerful kfed feature is its ability to fix corrupt ASM metadata.

The kfed binary is present in the recent ASM versions, but if you don’t see it in your $ORACLE_HOME/bin directory (e.g. it may not be present in version 10.1), it can be built as follows:

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins* ikfed

kfed read

With the kfed read command we can read a single ASM metadata block. The syntax is:

$ kfed read [aun=ii aus=jj blkn=kk dev=]asm_disk_name

Where the command line parameters are

  • aun – Allocation Unit (AU) number to read from. Default is AU0, or the very beginning of the ASM disk.
  • aus – AU size. Default is 1048576 (1MB). Specify the aus when reading from a disk group with non-default AU size.
  • blkn – block number to read. Default is block 0, or the very first block of the AU.
  • dev – ASM disk or device name. Note that the keyword dev can be omitted, but the ASM disk name is mandatory.

Use kfed to read ASM disk header block

The following is an example of using the kfed utility to read the ASM disk header from ASM disk /dev/sda1.

$ kfed read /dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3102721733 ; 0x00c: 0xb8efc6c5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                   12284 ; 0x0c4: 0x00002ffc

Note that the above kfed command is equivalent to this one (with all parameters explicitly set to their default values):

$ kfed read aun=0 aus=1048576 blkn=0 dev=/dev/sda1

We see that the above kfed output is nicely formatted and human readable (sort of). The fields are grouped based on the actual content of the ASM metadata block.

In this example, the kfbh fields show the block header data, and the most important one is kfbh.type, which says KFBTYP_DISKHEAD, meaning the ASM disk header. This is the expected block type for an ASM disk header.

We then see the actual content of the ASM disk header metadata block – the kfdhdb fields. Some of those are the disk number (kfdhdb.dsknum), 0 in this case, the group redundancy type (kfdhdb.grptyp), normal redundancy in this case, the disk header status (kfdhdb.hdrsts), member in this case, the disk name (kfdhdb.dskname) – DATA_0000, etc.

Please see ASM disk header for the complete explanation of kfdhdb fields.

Use kfed to read any ASM metadata block

The next example shows how to read an ASM File Directory block. To do that we would use the following kfed command:

$ kfed read aun=10 blkn=1 dev=/dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
Note that I had to specify AU10 and block 1 to read a File Directory block. Have a look at theASM File Directory post to learn how to locate a File Directory block.

Is my ASM metadata block corrupt

If you see kfbh.type=KFBTYP_INVALID, in the disk header on a disk you believe belongs to an ASM disk group, that indicates that the ASM disk header is corrupt. But don’t jump to conclusions! Are you looking at the right disk? Is this the right disk partition? Can you access that disk via some other name – in a multipath setup? If you are not sure, or if the disk header is in fact damaged, contact Oracle Support for assistance.

Note that this applies to any ASM metadata block. If ASM expects to find a metadata block and instead finds a block that is zeroed out or contains rubbish, it will report the block as KFBTYP_INVALID, and an error (usually ORA-15196) will be reported in the ASM and/or database alert log (depends on which instance discovers the problem).

kfed write

With the kfed write command we can write to a single ASM metadata block. The syntax is:

$ kfed write [aun=ii aus=jj blkn=kk dev=]asm_disk_name text=new_contents chksum=yes

Where the new command line parameters are

  • text – a text file with the new block contents
  • checksum=yes – calculate and write the correct checksum. Note that the checksum in the text file with the new content does not have to be correct.

Use kfed to write the correct checksum to ASM metadata block

An ASM metadata may look fine, but in fact be corrupt. For example the block checksum (kfbh.check) could be wrong, in which case that would need to be corrected. Indeed, if the only problem is an incorrect checksum, that can be easily corrected by simply reading the block and then writing it back! The kfed will calculate the new checksum and write the block back with the correct checksum.

Here are the complete steps to correct the bad checksum for block 2 in AU0 on disk /dev/sda1:

$ kfed read aun=0 blkn=2 dev=/dev/sda1 > /tmp/aun0_blkn2_sda1.kfed
$ kfed write aun=0 blkn=2 dev=/dev/sda1 text=/tmp/aun0_blkn2_sda1.kfed chksum=yes
NOTE: Please seek Oracle Support assistance with any suspected ASM metadata block corruption.

kfed find

The kfed find will examine all blocks in an allocation unit and report back on the block types found. The syntax is:

$ kfed find [aun=ii aus=jj dev=]asm_disk_name

We see that the find command parameters are the same as for the read command, but the difference is that the find operates on all blocks in an allocation unit.

Use kfed find command to verify blocks in AU0

This is an example of using the kfed find to verify that all blocks in AU0 have the expected ASM metadata.

$ kfed find /dev/sda1

The expected result is type 1 for block 0, type 2 for block 1 and type 3 for all other blocks, i.e.:

$ kfed find /dev/sda1
Block 0 has type 1
Block 1 has type 2
Block 2 has type 3
Block 3 has type 3
Block 4 has type 3

Block 255 has type 3

If you see anything else in the output, that indicates a corrupted ASM metadata block. In that case please seek assistance from Oracle Support.

Note that my allocation unit size is 1MB, so there are only 255 blocks in the AU. If your allocation unit size is 4MB, the same command should return block type information for 1024 blocks.

I should also point out that with the above find command we only looked at the expected ASM metadata block types. We did not look at the actual metadata block contents. Some ASM metadata block corruptions are indeed with the block contents, i.e the block type is correct, but the contents is wrong. Such corruptions are only detected when ASM reads the corrupt block, in which case an ORA-15196 error will be reported. Please seek assistance from Oracle Support if you are unfortunate enough to encounter that error.

Conclusion

The kfed if an unassuming but very powerful utility. While I have shown only few commands, the kfed can also format an empty ASM file, perform a sanity check on an ASM metadata block, display data structure sizes and perform few other more obscure operations.

【转】About ASM disk groups, disks and files

Oracle ASM uses disk groups to store data files. An ASM disk group is a collection of disks managed as a unit. Within a disk group, ASM exposes a file system interface for Oracle database files. The content of files that are stored in a disk group is evenly distributed to eliminate hot spots and to provide uniform performance across the disks. The performance is comparable to the performance of raw devices. [From Oracle® Automatic Storage Management Administrator’s Guide 11g Release 2].


ASM Disk Groups


An ASM disk group consists of one or more disks and is the fundamental object that ASM manages. Each disk group is self contained and has its own ASM metadata. It is that ASM metadata that an ASM instance manages.


The idea with ASM is to have small number of disk groups. In ASM versions before 11.2, two disk groups should be sufficient – one for datafiles and one for backups/archive logs. In 11.2 you would want to create a separate disk group for ASM spfile, Oracle Cluster Registry (OCR) and voting disks – provided you opt to place those objects in an ASM disk group.


ASM Disks


Disks to be used by ASM have to be set up and provisioned by OS/storage administrator before ASM installation/setup. Disks can be local physical devices (IDE, SATA, SCSI, etc), SAN based LUNs (iSCSI, FC, FCoE, etc) or NAS/NFS based disks. Disks to be used for ASM should be partitioned. Even if the whole disk is to be used by ASM, it should have a single partition.


The above is true for all environments except for Exadata – where ASM makes use of grid disks, created from cell disks and presented to ASM via LIBCELL interface.


An ASM disk group can have up to 10,000 disks. Maximum size for an individual ASM disk is 2 TB. Due to bug 6453944, it is possible to add disks over 2 TB to an ASM disk group. The fix for bug 6453944 is in 10.2.0.4, 11.1.0.7 and 11.2. MOS Doc ID 736891.1 has more on that.

ASM looks for disks in the OS location specified by ASM_DISKSTRING initialization parameter. All platforms have the default value, so this parameter does not have to be specified. In a cluster, ASM disks can have different OS names on different nodes. In fact, ASM does not care about the OS disk names, as those are not kept in ASM metadata.


ASM Files


Any ASM file is allocated from and completely contained within a single disk group. However, a disk group might contain files belonging to several databases and a single database can have files in multiple disk groups.

ASM can store all Oracle database file types – datafiles, control files, redo logs, backup sets, data pump files, etc – but not binaries or text files. In addition to that, ASM also stores its metadata files within the disk group. ASM has its own file numbering scheme – independent of database file numbering.  ASM file numbers under 256 are reserved for ASM metadata files.


ASM Cluster File System (ACFS), introduced in 11.2, extends ASM support to database and application binaries, trace and log files, and in fact any files that can be stored on a traditional file systems. And most importantly, the ACFS is a cluster file system.

【转】About ASM Allocation Units, Extents, Mirroring and Failgroups

1194994650683359231miror.svg.medASM Allocation Units

An ASM allocation unit (AU) is the fundamental space unit within an ASM disk group. Every ASM disk is divided into allocation units.

When a disk group is created, the allocation unit size can be set with the  disk group attribute AU_SIZE (in ASM versions 11.1 and later). The AU size can be 1, 2, 4, 8, 16, 32 or 64 MB. If not explicitly set, the AU size defaults to 1 MB (4MB in Exadata).

AU size is a disk group attribute, so each disk group can have a different AU size.

ASM Extents

An ASM extent consists of one or more allocation units. An ASM file consists of one or more ASM extents.

We distinguish between physical and virtual extents. A virtual extent, or an extent set, consists of one physical extent in an external redundancy disk group, at least two physical extents in a normal redundancy disk group and at least three physical extents in a high redundancy disk group.

Before ASM version 11.1 we had uniform extent size. ASM version 11.1 introduced the variable sized extents that enable support for larger data files, reduce (ASM and database) SGA memory requirements for very large databases, and improve performance for file create and open operations. The initial extent size equals the disk group AU_SIZE and it increases by a factor of 4 or 16 at predefined thresholds. This feature is automatic for newly created and resized data files with disk group compatibility attributes COMPATIBLE.ASM and COMPATIBLE.RDBMS set to 11.1 or higher.

The extent size of a file varies as follows:

  • Extent size always equals the disk group AU_SIZE for the first 20,000 extent sets
  • Extent size equals 4*AU_SIZE for the next 20,000 extent sets
  • Extent size equals 16*AU_SIZE for the next 20,000 and higher extent sets

There is nasty bug 8898852 to do with this feature. See more on that in MOS Doc ID 965751.1.

ASM Mirroring

ASM mirroring protects data integrity by storing multiple copies of the same data on different disks. When a disk group is created, ASM administrator can specify the disk group redundancy as follows:

  • External – no ASM mirroring
  • Normal – 2-way mirroring
  • High – 3-way mirroring

ASM mirrors extents – it does not mirror disks or blocks. ASM file mirroring is the result of mirroring of the extents that constitute the file. In ASM we can specify the redundancy level per file. For example, one file in a normal redundancy disk group, can have its extents mirrored once (default behavior). Another file, in the same disk group, can be triple mirrored – provided there are at least three failgroups in the disk group.  In fact all ASM metadata files are triple mirrored in a normal redundancy disk group – provided there are at least three failgroups.

ASM Failgroups

ASM disks within a disk group are partitioned into failgroups (also referred to as failure groups or fail groups). The failgroups are defined at the time the disk group is created.  If we omit the failgroup specification, then ASM automatically places each disk into its own failgroup. The only exception is Exadata, where all disks from the same storage cell are automatically placed in the same failgroup.

Normal redundancy disk groups require at least two failgroups. High redundancy disk groups require at least three failgroups. Disk groups with external redundancy do not have failgroups.

When an extent is allocated for a mirrored file, ASM allocates a primary copy and a mirror copy. Primary copy is store on one disk and the mirror copy on some other disk in a different failgroup.

When adding disks to an ASM disk group for which failgroups are manually specified, it is imperative to add the disks to the correct failgroup.

【转】Set up ASM with a single ASMCA command

tools small

ASM Configuration Assistant (ASMCA) was introduced in ASM version 11.2. It is used to configure ASM instances, and to create and manage disk groups, volumes and ASM cluster file systems (ACFS). ASMCA can be used in GUI or command-line mode.

In this post I will show how to use the ASMCA – in a non-cluster environment – to create and start an ASM  instance, create a disk group and start related/required services. I will use the ASMCA in a command line mode with the silent option.

Perform the Grid Infrastructure software only installation

You may want to do this with ASM job role separation option, in which case you should perform all steps as OS user grid. Otherwise, perform all steps as OS user oracle.

Set up the disks to be used by ASM

I used ASMLIB to create 4 disks for ASM, and I can see them as follows:

$ oracleasm listdisks
DISK1
DISK2
DISK3
DISK4

Configure ASM

Run the following command:

$ asmca -silent -configureASM -sysAsmPassword s3kr3t1 -asmsnmpPassword s3kr3t2 -diskString ‘ORCL:*’ -diskGroupName DATA -disk ‘ORCL:*’ -redundancy EXTERNAL
As I used ASMLIB disks, I specified ‘ORCL:*’ for ASM discovery string. Make sure you specify the correct value for your environment.

On a successful run, the above command should have returned:

ASM created and started successfully.
DiskGroup DATA created successfully.
And it should have performed the following:

  • Start the cluster synchronisation services daemon – ocssd.bin
  • Start three agents – cssdagent, oraagent.bin and orarootagent.bin
  • Start the disk monitor – diskmon.bin
  • Create and start ASM instance +ASM
  • Create the external redundancy disk group DATA
  • Create ASM spfile in disk group DATA

I have also published this in MOS Doc ID 1068788.1.

【转】In version 11.2 ASM runs from the Grid Infrastructure home

Starting with version 11.2, ASM runs from the Grid Infrastructure home in both single instance and cluster installs. This is very different to earlier versions so I would encourage you to invest some time in understanding the differences. Here is a quick overview of some of the more interesting features.
Integration of Oracle Clusterware (Cluster Ready Services) and ASM

Oracle Cluster Registry (OCR), voting disks and ASM spfile can now be stored in an ASM disk group. ASM instances, disk groups and other ASM related objects are now resources managed by Clusterware and stored in OCR.

ASM job role and user separation option

I have a separate post on that – please see ASM job role and user separation option.

ORACLE_BASE

ORACLE_BASE directory cannot be shared in cluster installs, but it can be shared in single instance installations.

No localconfig in single instance installs

Although localconfig utility is mentioned in 11.2 documentation, in the context of Cluster Synchronization Service (CSS) and host name change, there is no localconfig in 11.2. Instead we now have roothas.pl. For more info on this please see MOS Doc ID 986740.1.

ASM can be set up during the install or later with ASMCA

ASM Configuration Assistant (ASMCA), a new tool in 11.2, is used to create and configure ASM instances, disk groups, volumes and ASM Cluster File Systems (ACFS). ASMCA can be used in GUI and command line modes. In addition to that, a silent mode can be used to automate ASM setup.

【转】Oracle ASM Job Role Separation Option with SYSASM

The SYSASM privilege (introduced in 11.1) is fully separated from the SYSDBA privilege in 11.2. If you choose to use this optional feature, and designate different operating system groups as the OSASM and the OSDBA groups, then the SYSASM administrative privilege is available only to members of the OSASM group. [From Oracle® Grid Infrastructure Installation Guide 11g Release 2].

To set up ASM admin and DBA job role separation, you need at least two OS users – one for the database, typically called oracle, and another one for the Grid Infrastructure, typically called grid.

The database OS user has to be in the software install group (oinstall), OSDBA group (dba) and OSDBA for ASM group (asmdba). [OSDBA group is designated at the installation time, which makes it SS_DBA_GRP in $ORACLE_HOME/rdbms/lib/config.c].

In my case that OS user is called oracle and the OSDBA group is called dba:

 

$ id oracle
uid=502(oracle) gid=500(oinstall) groups=500(oinstall),502(dba),506(asmdba)

$ grep “define SS_DBA_GRP” $ORACLE_HOME/rdbms/lib/config.c
#define SS_DBA_GRP “dba”

 

The Grid Infrastructure OS user has to be in the software install group (oinstall), OSASM group (asmadmin) and OSDBA for ASM group (asmdba). [OSASM and OSDBA for ASM groups are designated at the Grid Infrastructure installation time, which makes them SS_ASM_GRP and SS_DBA_GRP in $GRID_HOME/rdbms/lib/config.c].

In my case the OS user is called grid, the OSASM group is called asmadmin and the OSDBA for ASM group is called asmdba:

$ id grid
uid=1100(grid) gid=500(oinstall) groups=500(oinstall),502(dba),506(asmdba),1000(asmadmin),1301(asmoper)

$ egrep “define SS_DBA_GRP|define SS_ASM_GRP” $ORACLE_HOME/rdbms/lib/config.c
#define SS_DBA_GRP “asmdba”
#define SS_ASM_GRP “asmadmin”
To administer ASM the OS user grid should connect to ASM instance as SYSASM, as follows:

$ sqlplus / as sysasm
Given my OS user names and groups, the ownership of ASM disks has to be grid:asmadmin. In my Linux environment, with ASMLIB, my disk ownership is as follows:

 

$ ls -l /dev/oracleasm/disks/
total 0
brw-rw—- 1 grid asmadmin 8, 5 Mar 1 15:05 DISK1
brw-rw—- 1 grid asmadmin 8, 6 Mar 1 15:05 DISK2
brw-rw—- 1 grid asmadmin 8, 7 Mar 1 15:05 DISK3

 

The ownership is correct, as I specified the correct user and group at the time ASMLIB was installed. That can be verified as follows (note that this is ASMLIB specific):

$ egrep “^ORACLEASM_UID|^ORACLEASM_GID” /etc/sysconfig/oracleasm
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
Finally, and this is very important, the correct ownership of the oracle binary – in my database home – has to be oracle:asmadmin:

$ ls -l $ORACLE_HOME/bin/oracle
-r-xr-s–x 1 oracle asmadmin 173515991 Apr 8 12:10 /u01/app/oracle/product/11.2.0/dbhome_2/bin/oracle
With all this in place we have the correct set up for Oracle ASM job role separation feature.

【转】ASM files number 12 and 254

12The staleness directory (ASM file number 12) contains metadata to map the slots in the staleness registry to particular disks and ASM clients. The staleness registry (ASM file number 254) tracks allocation units that become stale while the disks are offline. This applies to normal and high redundancy disk groups with the attribute COMPATIBLE.RDBMS set to 11.1 or higher. The staleness metadata is created when needed, and grows to accommodate additional offline disks.

When a disk goes offline, each RDBMS instance gets a slot in the staleness registry for that disk. This slot has a bit for each allocation unit in the offline disk. When an RDBMS instance I/O write is targeted for an offline disk, that instance sets the corresponding bit in the staleness registry.

When a disk is brought back online, ASM copies the allocation units, that have the staleness registry bit set, from the mirrored extents. Because only allocation units that should have changed while the disk was offline are updated, bringing a disk online is more efficient then adding a disk if was dropped instead of just offlined.

No stale disks

The staleness metadata structures are created as needed, which means the staleness directory and registry do not exist when all disks are online.

SQL> SELECT g.name “Disk group”,
g.group_number “Group#”,
d.disk_number “Disk#”,
d.name “Disk”,
d.mode_status “Disk status”
FROM v$asm_disk d, v$asm_diskgroup g
WHERE g.group_number=d.group_number and g.group_number<>0
ORDER BY 1, 2, 3;

Disk group       Group#      Disk# Disk         Disk status
———— ———- ———- ———— ————
DATA                  1          0 ASMDISK1     ONLINE
1 ASMDISK2     ONLINE
2 ASMDISK3     ONLINE
RECO                  2          0 ASMDISK4     ONLINE
1 ASMDISK5     ONLINE
2 ASMDISK6     ONLINE

SQL> SELECT x.number_kffxp “File#”,
x.disk_kffxp “Disk#”,
x.xnum_kffxp “Extent”,
x.au_kffxp “AU”,
d.name “Disk name”
FROM x$kffxp x, v$asm_disk_stat d
WHERE x.group_kffxp=d.group_number
and x.disk_kffxp=d.disk_number
and x.number_kffxp in (12, 254)
ORDER BY 1, 2;

no rows selected

Stale disks

Staleness information will be created when a disk goes offline, but only when there are I/O writes intended for offline disks.

In the following example, I will offline the disk manually, with the ALTER DISKGROUP OFFLINE DISK command. But as far as stalenss metadata is concerned, it will be created irrespective of how and why a disk goes offline.

SQL> alter diskgroup RECO offline disk ASMDISK6;

Diskgroup altered.

SQL> SELECT g.name “Disk group”,
g.group_number “Group#”,
d.disk_number “Disk#”,
d.name “Disk”,
d.mode_status “Disk status”
FROM v$asm_disk d, v$asm_diskgroup g
WHERE g.group_number=d.group_number and g.group_number=2
ORDER BY 1, 2, 3;

Disk group       Group#      Disk# Disk         Disk status
———— ———- ———- ———— ————
RECO                  2          0 ASMDISK4     ONLINE
1 ASMDISK5     ONLINE
2 ASMDISK6     OFFLINE

Database keeps writing to this disk group, and after a while we see the staleness directory and staleness registry created for this disk group

SQL> SELECT x.number_kffxp “File#”,
x.disk_kffxp “Disk#”,
x.xnum_kffxp “Extent”,
x.au_kffxp “AU”,
d.name “Disk name”
FROM x$kffxp x, v$asm_disk_stat d
WHERE x.group_kffxp=d.group_number
and x.disk_kffxp=d.disk_number
and d.group_number=2
and x.number_kffxp in (12, 254)
ORDER BY 1, 2;

File#      Disk#     Extent         AU Disk name
———- ———- ———- ———- ——————————
12          0          0         86 ASMDISK4
1          0        101 ASMDISK5
2          0 4294967294 ASMDISK6
254          0          0         85 ASMDISK4
1          0        100 ASMDISK5
2          0 4294967294 ASMDISK6

Look inside

There is not much to see in the actual metadata. Even kfed struggles to recognise these types of metadata blocks 🙂

$ kfed read /dev/oracleasm/disks/ASMDISK4 aun=86 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           21 ; 0x002: *** Unknown Enum ***

kffdnd.bnode.incarn:                  1 ; 0x000: A=1 NUMM=0x0
kffdnd.bnode.frlist.number:  4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn:           0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number:        4294967295 ; 0x00c: 0xffffffff
kffdnd.overfl.incarn:                 0 ; 0x010: A=0 NUMM=0x0
kffdnd.parent.number:                 0 ; 0x014: 0x00000000
kffdnd.parent.incarn:                 1 ; 0x018: A=1 NUMM=0x0
kffdnd.fstblk.number:                 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn:                 1 ; 0x020: A=1 NUMM=0x0
kfdsde.entry.incarn:                  1 ; 0x024: A=1 NUMM=0x0
kfdsde.entry.hash:                    0 ; 0x028: 0x00000000
kfdsde.entry.refer.number:   4294967295 ; 0x02c: 0xffffffff
kfdsde.entry.refer.incarn:            0 ; 0x030: A=0 NUMM=0x0
kfdsde.cid:                       +ASMR ; 0x034: length=5
kfdsde.indlen:                        1 ; 0x074: 0x0001
kfdsde.flags:                         0 ; 0x076: 0x0000
kfdsde.spare1:                        0 ; 0x078: 0x00000000
kfdsde.spare2:                        0 ; 0x07c: 0x00000000
kfdsde.indices[0]:                    0 ; 0x080: 0x00000000
kfdsde.indices[1]:                    0 ; 0x084: 0x00000000
kfdsde.indices[2]:                    0 ; 0x088: 0x00000000

$ kfed read /dev/oracleasm/disks/ASMDISK4 aun=85 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           20 ; 0x002: *** Unknown Enum ***

kfdsHdrB.clientId:           1297301881 ; 0x000: 0x4d534179
kfdsHdrB.incarn:                      0 ; 0x004: 0x00000000
kfdsHdrB.dskNum:                      2 ; 0x008: 0x0002
kfdsHdrB.ub2spare:                    0 ; 0x00a: 0x0000
ub1[0]:                               0 ; 0x00c: 0x00
ub1[1]:                               0 ; 0x00d: 0x00
ub1[2]:                               0 ; 0x00e: 0x00
ub1[3]:                               0 ; 0x00f: 0x00
ub1[4]:                               0 ; 0x010: 0x00
ub1[5]:                               0 ; 0x011: 0x00
ub1[6]:                               0 ; 0x012: 0x00
ub1[7]:                              16 ; 0x013: 0x10
ub1[8]:                               0 ; 0x014: 0x00

Not much to see, as these are just bitmaps.

Conclusion

The staleness directory and staleness registry are supporting metadata structure for the disk offline and fast resync feature introduced in ASM version 11. The staleness directory contains metadata to map the slots in the staleness registry to particular disks and ASM clients. The staleness registry tracks allocation units that become stale while the disks are offline. This feature is relevant to normal and high redundancy disk groups only.

【转】Oracle ASM ASM in Exadata

three storage cells

ASM is a critical component of the Exadata software stack. It is also a bit different – compared to non-Exadata environments. It still manages your disk groups, but builds those with grid disks. It still takes care of disk errors, but also handles predictive disk failures. It doesn’t like external redundancy, but it makes the disk group smart scan capable. Let’s have a closer look.

Grid disks

In Exadata the ASM disks live on storage cells and are presented to compute nodes (where ASM instances run) via Oracle proprietary iDB protocol. Each storage cell has 12 hard disks and 16flash disks. During Exadata deployment grid disks are created on those 12 hard disks. Flash disks are used for the flash and redo log cache, so grid disks are normally not created on flash disks.

Grid disks are not exposed to the Operating System, so only database instances, ASM and related utilities, that speak iDB, can see them. The kfod, ASM discovery tool, is one such utility. Here is an example of kfod discovering grid disks in one Exadata environment:

$ kfod disks=all
—————————————————————–
Disk          Size Path                           User     Group
=================================================================

1:     433152 Mb o/192.168.10.9/DATA_CD_00_exacell01   
2:     433152 Mb o/192.168.10.9/DATA_CD_01_exacell01   
3:     433152 Mb o/192.168.10.9/DATA_CD_02_exacell01   

13:      29824 Mb o/192.168.10.9/DBFS_DG_CD_02_exacell01 
14:      29824 Mb o/192.168.10.9/DBFS_DG_CD_03_exacell01 
15:      29824 Mb o/192.168.10.9/DBFS_DG_CD_04_exacell01 

23:     108224 Mb o/192.168.10.9/RECO_CD_00_exacell01   
24:     108224 Mb o/192.168.10.9/RECO_CD_01_exacell01   
25:     108224 Mb o/192.168.10.9/RECO_CD_02_exacell01   

474:     108224 Mb o/192.168.10.22/RECO_CD_09_exacell14   
475:     108224 Mb o/192.168.10.22/RECO_CD_10_exacell14   
476:     108224 Mb o/192.168.10.22/RECO_CD_11_exacell14   

—————————————————————–
ORACLE_SID ORACLE_HOME
=================================================================
+ASM1 /u01/app/11.2.0.3/grid
+ASM2 /u01/app/11.2.0.3/grid
+ASM3 /u01/app/11.2.0.3/grid

+ASM8 /u01/app/11.2.0.3/grid
$

Note that grid disks are prefixed with either DATA, RECO or DBFS_DG. Those are ASM disk group names in this environment. Each grid disk name ends with the storage cell name. It is also important to note that disks with the same prefix have the same size. The above example is from a full rack – hence 14 storage cells and 8 ASM instances.

ASM_DISKSTRING

In Exadata ASM_DISKSTRING=’o/*/*’. That is suggesting to ASM that it is running on an Exadata compute node and to expect grid disks.

$ sqlplus / as sysasm
SQL> show parameter asm_diskstring
NAME           TYPE   VALUE
————– —— —–
asm_diskstring string o/*/*

Automatic failgroups

There are no external redundancy disk groups in Exadata – you have a choice of either normal or high redundancy. When creating disk groups, ASM automatically puts all grid disks from the same storage cell into the same failgroup. The failgroup is then named after the storage cell.

This would be an example of creating a diskgroup in Exadata environment (note how that grid disk prefix comes in handy):

SQL> create diskgroup RECO
disk ‘o/*/RECO*’
attribute
‘COMPATIBLE.ASM’=’11.2.0.0.0’,
‘COMPATIBLE.RDBMS’=’11.2.0.0.0’,
‘CELL.SMART_SCAN_CAPABLE’=’TRUE’;

Once the disk group is created we can check the disk and failgroup names:

SQL> select name, failgroup, path from v$asm_disk_stat where name like ‘RECO%’;

NAME                 FAILGROUP PATH
——————– ——— ———————————–
RECO_CD_08_EXACELL01 EXACELL01 o/192.168.10.3/RECO_CD_08_exacell01
RECO_CD_07_EXACELL01 EXACELL01 o/192.168.10.3/RECO_CD_07_exacell01
RECO_CD_01_EXACELL01 EXACELL01 o/192.168.10.3/RECO_CD_01_exacell01

RECO_CD_00_EXACELL02 EXACELL02 o/192.168.10.4/RECO_CD_00_exacell02
RECO_CD_05_EXACELL02 EXACELL02 o/192.168.10.4/RECO_CD_05_exacell02
RECO_CD_04_EXACELL02 EXACELL02 o/192.168.10.4/RECO_CD_04_exacell02

SQL>

Note that we did not specify the failgroup names in the CREATE DISKGROUP statement. ASM has automatically put grid disks from the same storage cell in the same failgroup.

cellip.ora

The cellip.ora is the configuration file, on every database server, that tells ASM instances which cells are available to the cluster.

Here is a content of a typical cellip.ora file for a quarter rack system:

$ cat /etc/oracle/cell/network-config/cellip.ora
cell=”192.168.10.3″
cell=”192.168.10.4″
cell=”192.168.10.5″

Now that we see what is in the cellip.ora, the grid disk path, in the examples above, should make more sense.

Disk group attributes

The following attributes and their values are recommended in Exadata environments:

  • COMPATIBLE.ASM – Should be set to the ASM software version in use.
  • COMPATIBLE.RDBMS – Should be set to the database software version in use.
  • CELL.SMART_SCAN_CAPABLE – Has be set to TRUE. This attribute/value is actually mandatory in Exadata.
  • AU_SIZE – Should be set to 4M. This is the default value in recent ASM versions for Exadata environments.

Initialization parameters

The following recommendations are for ASM version 11.2.0.3.

Parameter Value
CLUSTER_INTERCONNECTS Bondib0 IP address for X2-2. Colon delimited Bondib* IP addresses for X2-8.
ASM_POWER_LIMIT 1 for a quarter rack, 2 for all other racks.
SGA_TARGET 1250 MB
PGA_AGGREGATE_TARGET 400 MB
MEMORY_TARGET 0
MEMORY_MAX_TARGET 0
PROCESSES For less than 10 instances per node: 50*(#db instances per node + 1). For 10 0r more more instances per node: [50*MIN(#db instances per node + 1, 11)] + [10*MAX(#db instance per node – 10, 0)]
USE_LARGE_PAGES ONLY

Voting disks and disk group redundancy

Default location for voting disks in Exadata is ASM disk group DBFS_DG. That disk group can be either normal or high redundancy, except in a quarter rack where it has to be a normal redundancy.

This is because of the voting disks requirement for the minimal number of failgroups in a given ASM disk group. If we put voting disks in a normal redundancy disk group, that disk group has to have at least 3 failgroups. If we put voting disks in a high redundancy disk group, that disk group has to have at least 5 failgroups.

In a quarter rack, where we have only 3 storage cells, all disk groups can have at most 3 failgroups. While we can create a high redundancy disk group with 3 storage cells, voting disks cannot go into that disk group as it does not have 5 failgroups.

XDMG and XDWK background processes

These two process run in ASM instances on compute nodes. XDMG monitors all configured Exadata cells for storage state changes and performs the required tasks for such events. Its primary role is to watch for inaccessible disks and to initiate the disk online operations, when they become accessible again. Those operations are then handled by XDWK.

XDWK gets started when asynchronous actions such as disk ONLINE, DROP and ADD are requested by XDMG. After a 5 minute period of inactivity, this process will shut itself down.

Exadata Server, that runs on the storage cells, monitors disk health and performance. If the disk performance degrades it can put it into proactive failure mode. It also monitors for predictive failures based on the disk’s SMART (Self-monitoring, Analysis and Reporting Technology) data. In both cases, the Exadata Server notifies XDMG to take those disks offline.

When a faulty disk is replacedf on the storage cell, the Exadata Server will recrate all grid disks on a new disk. It will then notify XDMG to bring those grid disks online or add them back to disk groups, in case they were already dropped.

The diskmon

The master diskmon process (diskmon.bin) can be seen running in all Grid Infrastructure installs, but it’s only in Exadata that it’s actually doing any work. On every compute node there will be one master diskmon process and one DSKM, slave diskmon process, per every Oracle instance (including ASM). Here is an example from one compute node:

# ps -ef | egrep “diskmon|dskm” | grep -v grep
oracle    3205     1  0 Mar16 ?        00:01:18 ora_dskm_ONE2
oracle   10755     1  0 Mar16 ?        00:32:19 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
oracle   17292     1  0 Mar16 ?        00:01:17 asm_dskm_+ASM2
oracle   24388     1  0 Mar28 ?        00:00:21 ora_dskm_TWO2
oracle   27962     1  0 Mar27 ?        00:00:24 ora_dskm_THREE2
#

In Exadata, the diskmon is responsible for

  • Handling of storage cell failures and I/O fencing
  • Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat)
  • Broadcasting intra database IORM (I/O Resource Manager) plans from databases to storage cells
  • Monitoring or the control messages from database and ASM instances to storage cells
  • Communicating with other diskmons in the cluster

ACFS

The ACFS (ASM Cluster File System) is supported in Exadata environments staring with ASM version 12.1.0.2. Alternatives to the ACFS are the DBFS (Database based File System) and the NFS (Network File System). Many Exadata customers have an Oracle ZFS Appliance that can provide a high performance, InfiniBand connected, NFS storage.

Conclusion

There are quite a few extra features and differences in ASM compared to non-Exadata environments. Most of them are about storage cells and grid disks, and some are about tuning ASM for the extreme Exadata performance.

沪ICP备14014813号-2

沪公网安备 31010802001379号