Exadata存储架构概述

原文链接: http://www.dbaleet.org/overview_of_exadata_storage_architecture/

本文旨在简单介绍Exadata存储的架构,通过阅读本文能对Exadata Storage有一个初步的了解。

Exadata的磁盘层次结构非常清晰, 自第向上依次是 Physicaldisk=>LUN=>celldisk=>griddisk=>ASM disk

 

 

注1:其中上图表示操作系统所在的磁盘(一般为前两块磁盘)的架构;下图表示非操作系统盘的存储架构。(剩余没有进行分区的十块盘)。

注2: griddisk在RDBMS层面对应的是ASM disk, griddisk和ASM disk实际是同一个东西,但是是分别站在Exadata Stroage和RDBMS的角度来看的。

注3: 当前Celldisk与Griddisk的对应关系为1:m,即一对多。但是Kevin Closson提到准确的应该是多对多的关系,但是为了理解上的简单,可以认为是celldisk和griddisk是一对多的关系。

we (Exadata development) considered supporting celldisk creation from HW RAID volumes 
but opted for a 1:1 relationship instead for many reasons. Griddisks are the virtualization of celldisks (the presentation form to ASM). 
To that end there is a M:M relationship between celldisks and griddisks.

下面通过分自底向上的方式分别介绍各层的情况。

首先是physicaldisk:

[root@dm01cel01 ~]#cellcli -e list physicaldisk

dm01cel01: 20:0 RETS0D normal
dm01cel01: 20:1 REXHAD normal
dm01cel01: 20:2 RE5VTD normal
dm01cel01: 20:3 RE5SYD normal
dm01cel01: 20:4 RDDTYD normal
dm01cel01: 20:5 RETB5D normal
dm01cel01: 20:6 RDDS0D normal
dm01cel01: 20:7 RDDULD normal
dm01cel01: 20:8 RDDPZD normal
dm01cel01: 20:9 REXS8D normal
dm01cel01: 20:10 RDDTBD normal
dm01cel01: 20:11 RDDT9D normal
dm01cel01: FLASH_1_0 1202M0CPA5 normal
dm01cel01: FLASH_1_1 1202M0CPA7 normal
dm01cel01: FLASH_1_2 1202M0CQKE normal
dm01cel01: FLASH_1_3 1202M0CPA6 normal
dm01cel01: FLASH_2_0 1202M0CQE6 normal
dm01cel01: FLASH_2_1 1202M0CQE0 normal
dm01cel01: FLASH_2_2 1202M0CQA3 normal
dm01cel01: FLASH_2_3 1202M0CQAL normal
dm01cel01: FLASH_4_0 1202M0CP0E normal
dm01cel01: FLASH_4_1 1202M0CP0D normal
dm01cel01: FLASH_4_2 1202M0CNXH normal
dm01cel01: FLASH_4_3 1202M0CP0A normal
dm01cel01: FLASH_5_0 1202M0CQAE normal
dm01cel01: FLASH_5_1 1202M0CQ9V normal
dm01cel01: FLASH_5_2 1202M0CQA0 normal
dm01cel01: FLASH_5_3 1202M0CQAD normal

从上面可以看到:

以20:*表示的是这个cell节点上的12块物理硬盘;

以FLASH_*-*的表示FLASH卡,每块FLASH卡有4个FMOD,一共有4块FLASH卡,所以能看到16块闪盘。

我们在操作系统层面使用操作系统工具df/fdisk来查看当前cell的操作系统,磁盘/闪盘信息。

[root@dm01cel01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md6 9.9G 3.6G 5.9G 38% /
tmpfs 12G 0 12G 0% /dev/shm
/dev/md8 2.0G 647M 1.3G 34% /opt/oracle
/dev/md4 116M 60M 50M 55% /boot
/dev/md11 2.3G 130M 2.1G 6% /var/log/oracle

操作系统所用的分区为:md6,md8, md4, md11 。

[root@dm01cel01 ~]# mdadm -Q -D /dev/md6
/dev/md6:
Version : 0.90
Creation Time : ......
Raid Level : raid1
Array Size : 10482304 (10.00 GiB 10.73 GB)
Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 6
Persistence : Superblock is persistent

Update Time : ......
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
......

Number Major Minor RaidDevice State
0 8 6 0 active sync /dev/sda6
1 8 22 1 active sync /dev/sdb6

可以看到/dev/md6实际上是由/dev/sda6与/dev/sdb6组成的soft RAID。同样md8, md4, md11 分别是由/dev/sda8与/dev/sdb8,/dev/sda4与/dev/sdb4, /dev/sda11与/dev/sdb11组成的RAID。限于篇幅,这里不一一列出来。

[root@dm01cel01 ~]# fdisk -l

Disk /dev/sda: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16          16        8032+  83  Linux
/dev/sda3              17       69039   554427247+  83  Linux
/dev/sda4           69040       72824    30403012+   f  W95 Ext'd (LBA)
/dev/sda5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sda6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sda7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sda8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sda9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sda10          72433       72521      714861   fd  Linux raid autodetect
/dev/sda11          72522       72824     2433816   fd  Linux raid autodetect

Disk /dev/sdb: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          15      120456   fd  Linux raid autodetect
/dev/sdb2              16          16        8032+  83  Linux
/dev/sdb3              17       69039   554427247+  83  Linux
/dev/sdb4           69040       72824    30403012+   f  W95 Ext'd (LBA)
/dev/sdb5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sdb6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sdb7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sdb8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sdb9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sdb10          72433       72521      714861   fd  Linux raid autodetect
/dev/sdb11          72522       72824     2433816   fd  Linux raid autodetect

Disk /dev/sdc: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn't contain a valid partition table

Disk /dev/sde: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn't contain a valid partition table

Disk /dev/sdf: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn't contain a valid partition table

Disk /dev/sdg: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn't contain a valid partition table

Disk /dev/sdh: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn't contain a valid partition table

Disk /dev/sdi: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdi doesn't contain a valid partition table

Disk /dev/sdj: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdj doesn't contain a valid partition table

Disk /dev/sdk: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdk doesn't contain a valid partition table

Disk /dev/sdl: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdl doesn't contain a valid partition table

Disk /dev/sdm: 4009 MB, 4009754624 bytes
126 heads, 22 sectors/track, 2825 cylinders
Units = cylinders of 2772 * 512 = 1419264 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdm1               1        2824     3914053   83  Linux

Disk /dev/md1: 731 MB, 731906048 bytes
2 heads, 4 sectors/track, 178688 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md11: 2492 MB, 2492137472 bytes
2 heads, 4 sectors/track, 608432 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md11 doesn't contain a valid partition table

Disk /dev/md2: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md8: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md8 doesn't contain a valid partition table

Disk /dev/md7: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md7 doesn't contain a valid partition table

Disk /dev/md6: 10.7 GB, 10733879296 bytes
2 heads, 4 sectors/track, 2620576 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md6 doesn't contain a valid partition table

Disk /dev/md5: 10.7 GB, 10733879296 bytes
2 heads, 4 sectors/track, 2620576 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md5 doesn't contain a valid partition table

Disk /dev/md4: 123 MB, 123273216 bytes
2 heads, 4 sectors/track, 30096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md4 doesn't contain a valid partition table

Disk /dev/sdn: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdn doesn't contain a valid partition table

Disk /dev/sdo: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdo doesn't contain a valid partition table

Disk /dev/sdp: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdp doesn't contain a valid partition table

Disk /dev/sdq: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdq doesn't contain a valid partition table

Disk /dev/sdr: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdr doesn't contain a valid partition table

Disk /dev/sds: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sds doesn't contain a valid partition table

Disk /dev/sdt: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdt doesn't contain a valid partition table

Disk /dev/sdu: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdu doesn't contain a valid partition table

Disk /dev/sdv: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdv doesn't contain a valid partition table

Disk /dev/sdw: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdw doesn't contain a valid partition table

Disk /dev/sdx: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdx doesn't contain a valid partition table

Disk /dev/sdy: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdy doesn't contain a valid partition table

Disk /dev/sdz: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdz doesn't contain a valid partition table

Disk /dev/sdaa: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdaa doesn't contain a valid partition table

Disk /dev/sdab: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdab doesn't contain a valid partition table

Disk /dev/sdac: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdac doesn't contain a valid partition table

fdisk的输出至少包含如下信息:

  • 此cell为HP(high performance)的硬盘,单个磁盘的大小为600G;
  • 设备号sda和sdb为操作系统所在的磁盘, 这两个磁盘分别划分了11个partition;
  • 12块硬盘的对应设备编号为:/dev/sda到/dev/sdk;
  • 16个FDOM对应的设备编号为:/dev/sdn到/dev/sdac。

继续查看LUN的信息:

[root@dm01cel01 ~]#cellcli -e list LUN


dm01cel01: 0_0 0_0 normal
dm01cel01: 0_1 0_1 normal
dm01cel01: 0_2 0_2 normal
dm01cel01: 0_3 0_3 normal
dm01cel01: 0_4 0_4 normal
dm01cel01: 0_5 0_5 normal
dm01cel01: 0_6 0_6 normal
dm01cel01: 0_7 0_7 normal
dm01cel01: 0_8 0_8 normal
dm01cel01: 0_9 0_9 normal
dm01cel01: 0_10 0_10 normal
dm01cel01: 0_11 0_11 normal
dm01cel01: 1_0 1_0 normal
dm01cel01: 1_1 1_1 normal
dm01cel01: 1_2 1_2 normal
dm01cel01: 1_3 1_3 normal
dm01cel01: 2_0 2_0 normal
dm01cel01: 2_1 2_1 normal
dm01cel01: 2_2 2_2 normal
dm01cel01: 2_3 2_3 normal
dm01cel01: 4_0 4_0 normal
dm01cel01: 4_1 4_1 normal
dm01cel01: 4_2 4_2 normal
dm01cel01: 4_3 4_3 normal
dm01cel01: 5_0 5_0 normal
dm01cel01: 5_1 5_1 normal
dm01cel01: 5_2 5_2 normal
dm01cel01: 5_3 5_3 normal

其中:

0_0到0_11为12块磁盘的LUN号

1_0到1_3为第一块flash卡的LUN号,依次类推,flash卡的LUN号有时并不连续,是因为更换过flash卡导致的。

接下来再来看celldisk的信息:

[root@dm01cel01 ~]#cellcli -e list celldisk


dm01cel01: CD_00_dm01cel01 normal
dm01cel01: CD_01_dm01cel01 normal
dm01cel01: CD_02_dm01cel01 normal
dm01cel01: CD_03_dm01cel01 normal
dm01cel01: CD_04_dm01cel01 normal
dm01cel01: CD_05_dm01cel01 normal
dm01cel01: CD_06_dm01cel01 normal
dm01cel01: CD_07_dm01cel01 normal
dm01cel01: CD_08_dm01cel01 normal
dm01cel01: CD_09_dm01cel01 normal
dm01cel01: CD_10_dm01cel01 normal
dm01cel01: CD_11_dm01cel01 normal
dm01cel01: FD_00_dm01cel01 normal
dm01cel01: FD_01_dm01cel01 normal
dm01cel01: FD_02_dm01cel01 normal
dm01cel01: FD_03_dm01cel01 normal
dm01cel01: FD_04_dm01cel01 normal
dm01cel01: FD_05_dm01cel01 normal
dm01cel01: FD_06_dm01cel01 normal
dm01cel01: FD_07_dm01cel01 normal
dm01cel01: FD_08_dm01cel01 normal
dm01cel01: FD_09_dm01cel01 normal
dm01cel01: FD_10_dm01cel01 normal
dm01cel01: FD_11_dm01cel01 normal
dm01cel01: FD_12_dm01cel01 normal
dm01cel01: FD_13_dm01cel01 normal
dm01cel01: FD_14_dm01cel01 normal
dm01cel01: FD_15_dm01cel01 normal

可以看到在celldisk中磁盘与闪盘分别以前缀CD和FD代替了,并且从上面的信息来看:没一块磁盘对应一块celldisk,每一flash卡的FDOM模块也对应一块celldisk。

最后来看一下griddisk:

[root@dm01cel01 ~]#cellcli -e list griddisk


dm01cel01: DATA_CD_00_dm01cel01 active
dm01cel01: DATA_CD_01_dm01cel01 active
dm01cel01: DATA_CD_02_dm01cel01 active
dm01cel01: DATA_CD_03_dm01cel01 active
dm01cel01: DATA_CD_04_dm01cel01 active
dm01cel01: DATA_CD_05_dm01cel01 active
dm01cel01: DATA_CD_06_dm01cel01 active
dm01cel01: DATA_CD_07_dm01cel01 active
dm01cel01: DATA_CD_08_dm01cel01 active
dm01cel01: DATA_CD_09_dm01cel01 active
dm01cel01: DATA_CD_10_dm01cel01 active
dm01cel01: DATA_CD_11_dm01cel01 active
dm01cel01: DBFS_DG_CD_02_dm01cel01 active
dm01cel01: DBFS_DG_CD_03_dm01cel01 active
dm01cel01: DBFS_DG_CD_04_dm01cel01 active
dm01cel01: DBFS_DG_CD_05_dm01cel01 active
dm01cel01: DBFS_DG_CD_06_dm01cel01 active
dm01cel01: DBFS_DG_CD_07_dm01cel01 active
dm01cel01: DBFS_DG_CD_08_dm01cel01 active
dm01cel01: DBFS_DG_CD_09_dm01cel01 active
dm01cel01: DBFS_DG_CD_10_dm01cel01 active
dm01cel01: DBFS_DG_CD_11_dm01cel01 active
dm01cel01: RECO_CD_00_dm01cel01 active
dm01cel01: RECO_CD_01_dm01cel01 active
dm01cel01: RECO_CD_02_dm01cel01 active
dm01cel01: RECO_CD_03_dm01cel01 active
dm01cel01: RECO_CD_04_dm01cel01 active
dm01cel01: RECO_CD_05_dm01cel01 active
dm01cel01: RECO_CD_06_dm01cel01 active
dm01cel01: RECO_CD_07_dm01cel01 active
dm01cel01: RECO_CD_08_dm01cel01 active
dm01cel01: RECO_CD_09_dm01cel01 active
dm01cel01: RECO_CD_10_dm01cel01 active
dm01cel01: RECO_CD_11_dm01cel01 active

从上面的信息来看每一块celldisk被分为3块griddisk。分别对应于ASM的diskgroup中的三个磁盘组DATA, DBFS_DG, RECO。也就是说DATA 这个diskgroup只使用含有前缀DATA的griddisk,依此类推。在默认没有使用interleaving的情况下, DATA前缀的griddisk使用的是磁盘的最外侧磁道(最先创建,偏移量最小),所以速度最快,RECO前缀的griddisk使用的是磁盘的最内测磁道,所以速度最慢。(update: 如果有DBFS, DBFS在最内圈,速度比RECO慢)

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号