如何一步步搭建Exadata虚拟机——Cell节点

原文链接:http://www.dbaleet.org/how_to_build_an_exadata_simulator_step_by_step_1_build_a_cell_node

overmars同学在四月初就询问过我具体应该如何搭建一套Exadata虚拟机,当时我的回答是在五月前我会写一篇如何搭建Exadata虚拟机的文章,请到时关注我的blog。这里得向overmars同学道歉,因为由于一些个人的原因爽约了。 Anyway,Just hope it is not too late。

我知道很多Oracle DBA对学习Exadata有兴趣,却一直苦于身边没有一套可以学习测试Exadata的环境。要知道Exadata是Oracle是软硬件结合的一体机,单纯是通过自己的个人电脑是永远无法模拟出来真实的Exadata环境。所以这里说的Exadata虚拟机说白了只是按照猫话出来的老虎。Exadata虚拟机在Oracle内部一直就存在,但是仅限于Oracle University或者Oracle Internal用来培训或者学习Exadata之用,在Oracle内部的网站中这个虚拟机标识为“ Internal Use Only, Strict Confidential” 的字样。本人无意违反O记的policy,所以需要自己从头到尾开始构建。

好了废话不多说了,要构建一套Exadata虚拟环境,至少需要两台虚拟机,一台用于Cell节点,一台用于DB节点。

首先您的机器需要较高的配置:

  • CPU Intel Core i3以上(或者AMD Athlon II X4以上), 推荐Core i5 (AMD Phenom II X4) ;
  • 内存(Memory)至少4G以上,推荐配置8G;
  • 磁盘(Harddisk)空余至少在40G以上,当然如果有SSD更好 ;)
  • 安装好虚拟机, 推荐使用Oracle Virtualbox (https://www.virtualbox.org/);
  • Oracle Linux 5.7安装介质。 可前往 https://edelivery.oracle.com/下载,下载前需要进行注册,注册是免费的。Oracle  Linux 5.7的介质名为V27570-01.zip, 解压后的文件名为OracleLinux-R5-U7-Server-x86_64-dvd.iso
  • Exadata 11.2.3.2 Cell的安装介质。可前往 https://edelivery.oracle.com/下载,下载前需要进行注册,注册是免费的。Exadata 11.2.3.2的Cell介质名为V33693-01.zip解压后文件名为cellImageMaker_11.2.3.2.0_LINUX.X64_120713-1.x86_64.tar
  • Oracle Clusterware 11.2.0.3以及Oracle database 11.2.0.3的Linux x86_64的安装介质,文件名为:p10404530_112030_Linux-x86-64_1of7.zip p10404530_112030_Linux-x86-64_2of7.zip p10404530_112030_Linux-x86-64_3of7.zip
  • 最新的补丁工具Opatch。 补丁号: 6880880:OPatch patch of version 11.2.0.3.4 for Oracle software releases 11.2.0.x (APRIL 2013)
  • Exadata RDBMS Bundle Patch 17 补丁号:16474946

然后就可以正式开始我们的Exadata之旅了。

 

首先需要在虚拟机中安装Oracle Linux 5.7, (Red Hat Enterprise Linux理论也可以,但我没有测试过),内存分配1GB通常就足够了。安装过程很简单,需要注意的是需要选上软件开发包,例如gcc/aio之类的,图形界面(GUI)可不装。推荐使用静态IP地址,我的网络配置如下:

 

[root@cell ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.56.255
HWADDR=08:00:27:B0:39:02
IPADDR=192.168.56.101
NETMASK=255.255.255.0
NETWORK=192.168.56.0
ONBOOT=yes
[root@cell ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1		cell localhost.localdomain localhost
::1		localhost6.localdomain6 localhost6

注意: 安装完成以后Oracle Linux默认使用UEK,如果这里使用UEK, 则在后面的步骤中无法正常启动cellsrv服务。可以修改grub的配置将其默认启动内核修改为redhat兼容内核:

[root@cell ~]# vi /etc/grub.conf

将default=0修改为default=1 ,然后重启。

因为默认Oracle Linux启动了很多我们不需要的服务,为了节省资源,建议将以下服务停止并且禁用。

chkconfig --level 2345 auditd off && service auditd stop 
chkconfig --level 2345 autofs off && service autofs stop 
chkconfig --level 2345 avahi-daemon off && service avahi-daemon stop
chkconfig --level 2345 bluetooth off && service bluetooth stop
chkconfig --level 2345 cups off && service cups stop 
chkconfig --level 2345 ip6tables off && service ip6tables stop 
chkconfig --level 2345 iptables off && service iptables stop 
chkconfig --level 2345 isdn off && service isdn stop 
chkconfig --level 2345 kudzu off && service kudzu stop
chkconfig --level 2345 mcstrans off && service auditd stop
chkconfig --level 2345 netfs off && service netfs stop
chkconfig --level 2345 pcscd off && service pcscd stop 
chkconfig --level 2345 restorecond off && service restorecond stop
chkconfig --level 2345 rhnsd off && service rhnsd stop 
chkconfig --level 2345 sendmail off && service sendmail stop
chkconfig --level 2345 setroubleshoot off && service settroubleshoot stop
chkconfig --level 2345 smartd off && service smartd stop
chkconfig --level 2345 xinetd off && service xinetd stop
chkconfig --level 2345 yum-updatesd off && service yum-updatesd stop

当然上述服务的禁用也可以通过 ntsysv –level 2345在图形界面进行选择,取消掉不需要的服务,然后重启生效。

然后我们将Exadata Cell Image V33693-01.zip上传到虚拟机内,解压,得到cellImageMaker_11.2.3.2.0_LINUX.X64_120713-1.x86_64.tar,继续解压得到一个名为dl180的文件夹。

[root@cell ~]# unzip V33693-01.zip
Archive:  V33693-01.zip
  inflating: README.txt              
  inflating: cellImageMaker_11.2.3.2.0_LINUX.X64_120713-1.x86_64.tar  
[root@cell ~]# tar -pxvf cellImageMaker_11.2.3.2.0_LINUX.X64_120713-1.x86_64.tar 
dl180......

在dl180/boot/cellbits下找到cell.bin文件。这个bin文件实际上是一个zip压缩包, 我们使用unzip来对它进行解压:

[root@cell ~]# unzip cell.bin 
Archive:  cell.bin
warning [cell.bin]:  6408 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: cell-11.2.3.2.1_LINUX.X64_130109-1.x86_64.rpm  
  inflating: jdk-1_5_0_15-linux-amd64.rpm

解压后得到cell-11.2.3.2.1_LINUX.X64_130109-1.x86_64.rpm和jdk-1_5_0_15-linux-amd64.rpm两个rpm包

我们先来安装jdk:

[root@cell ~]# rpm -ivh jdk-1_5_0_15-linux-amd64.rpm

然后再安装cell:

[root@cell ~]# rpm -ivh cell-11.2.3.2.1_LINUX.X64_130109-1.x86_64.rpm

安装的时候报错,提示有LWP包依赖,这是因为默认没有安装perl-libwww-perl导致的,但是这个包的依赖较多,推荐使用yum进行安装。

配置好yum源, 直接使用yum安装LWP:

[root@cell ~]#  yum install perl-libwww-perl

再次安装cell,再一次提示错误,前提条件不满足。

不过具体是什么前提条件不满足没有提示,rpm包管理就是这点不方便。只能通过以下方式生成具体的检查条件的脚本, 然后再分析是什么条件不满足:

[root@cell ~]# rpm --scripts -qp cell-11.2.3.2.1_LINUX.X64_130109-1.x86_64.rpm >>diag.log

打开diag.log,很快看到应该是/var/log/oracle目录不存在导致检查的前提条件通不过,于是手工建立这个目录, 并修改权限为775。

[root@cell ~]# mkdir -p /var/log/oracle
[root@cell ~]# chmod -R 775 /var/log/oracle

再次安装cell这次没有报错。

接下来的步骤应该是在cell虚拟机中建立对应的虚拟的磁盘和闪盘:

[root@cell ~]# mkdir -p /opt/oracle/cell/disks/raw
[root@cell ~]cd /opt/oracle/cell/disks/raw
[root@cell ~]vi dd.sh
[root@cell ~]cat dd.sh
dd if=/dev/zero of=disk01 bs=1M count=1000
dd if=/dev/zero of=disk02 bs=1M count=1000
dd if=/dev/zero of=disk03 bs=1M count=1000
dd if=/dev/zero of=disk04 bs=1M count=1000
dd if=/dev/zero of=disk05 bs=1M count=1000
dd if=/dev/zero of=disk06 bs=1M count=1000
dd if=/dev/zero of=disk07 bs=1M count=1000
dd if=/dev/zero of=disk08 bs=1M count=1000
dd if=/dev/zero of=disk09 bs=1M count=1000
dd if=/dev/zero of=disk10 bs=1M count=1000
dd if=/dev/zero of=disk11 bs=1M count=1000
dd if=/dev/zero of=disk12 bs=1M count=1000
dd if=/dev/zero of=FLASH01 bs=1M count=1000
dd if=/dev/zero of=FLASH02 bs=1M count=1000
dd if=/dev/zero of=FLASH03 bs=1M count=1000
dd if=/dev/zero of=FLASH04 bs=1M count=1000

执行dd.sh创建对应的磁盘和闪盘:其中磁盘12块,每块大小为1GB,闪盘4块,没块大小也是1GB。

[root@cell raw]# chmod 660 *
[root@cell raw]# ls -ltr
total 16400068
-rw-rw---- 1 root root        692 May 16 16:24 dd.sh
-rw-rw---- 1 root root 1048576000 May 16 16:24 disk01
-rw-rw---- 1 root root 1048576000 May 16 16:24 disk02
-rw-rw---- 1 root root 1048576000 May 16 16:24 disk03
-rw-rw---- 1 root root 1048576000 May 16 16:24 disk04
-rw-rw---- 1 root root 1048576000 May 16 16:25 disk05
-rw-rw---- 1 root root 1048576000 May 16 16:25 disk06
-rw-rw---- 1 root root 1048576000 May 16 16:25 disk07
-rw-rw---- 1 root root 1048576000 May 16 16:26 disk08
-rw-rw---- 1 root root 1048576000 May 16 16:26 disk09
-rw-rw---- 1 root root 1048576000 May 16 16:27 disk10
-rw-rw---- 1 root root 1048576000 May 16 16:27 disk11
-rw-rw---- 1 root root 1048576000 May 16 16:27 disk12
-rw-rw---- 1 root root 1048576000 May 16 16:27 FLASH01
-rw-rw---- 1 root root 1048576000 May 16 16:27 FLASH02
-rw-rw---- 1 root root 1048576000 May 16 16:27 FLASH03
-rw-rw---- 1 root root 1048576000 May 16 16:28 FLASH04

然后删除dd脚本,切换到celladmin用户,重新启动celld服务。

[root@cell ~]# su - celladmin
[celladmin@cell ~]$ cellcli -e alter cell restart services all

发现cellsrv服务无法启动,查看/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell/trace/alert.log发现有类似如下的报错信息:

CELLSRV version=11.2.3.2.1,label=OSS_11.2.3.2.1_LINUX.X64_130109,Wed_Jan__9_06:09:48_PST_2013
Non critical error DIA-48913 caught while writing to trace file "/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/cell/trace/svtrc_2244_0.trc"
Error message: DIA-48913: Writing into trace file failed, file size limit [0] reached

从错误号就可以判断应该是最大文件数不足, 于是需要再修改操作系统的最大文件数限制:

在/etc/sysctl.ctl最后添加一行: fs.file-max = 65536,然后刷新生效:

[root@cell ~]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
fs.file-max = 65536

在/etc/security/limit.conf文件最后添加两行:

* soft nofile 65536
* hard nofile 65536

然后退出重新登录, 切换到 celladmin,使用ulimit -a进行查看是否生效:

[root@cell ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 11999
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 11999
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

再次启动cell所有的服务:

[celladmin@cell ~]$ cellcli -e alter cell restart services all

这次发现cell下的cellsrv, ms, rs服务都可以正常启动了。

接下来需要在cellinit.ora中添加网卡的信息:

[celladmin@cell ~]$ cellcli -e create cell cell1 interconnect1=eth0

执行成功以后,可以看到cellinit.ora文件中添加了一行ipaddress1=192.168.56.101/24类似的信息。

[root@cell config]# cat /opt/oracle/cell/cellsrv/deploy/config/cellinit.ora
#CELL Initialization Parameters
version=0.0
DEPLOYED=TRUE
HTTP_PORT=8888
RMI_PORT=23791
SSL_PORT=23943
JMS_PORT=9127
BMC_SNMP_PORT=162
ipaddress1=192.168.56.101/24

接下来创建celldisk, griddisk, flashcache, flashlog:

[celladmin@cell ~]$ cellcli
CellCLI: Release 11.2.3.2.1 - Production on Thu May 16 23:11:41 CST 2013

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 1

CellCLI> alter cell restart services all

Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services...  running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was successful.

CellCLI> create celldisk all            
CellDisk FD_00_cell1 successfully created
CellDisk FD_01_cell1 successfully created
CellDisk FD_02_cell1 successfully created
CellDisk FD_03_cell1 successfully created
CellDisk CD_disk01_cell1 successfully created
CellDisk CD_disk02_cell1 successfully created
CellDisk CD_disk03_cell1 successfully created
CellDisk CD_disk04_cell1 successfully created
CellDisk CD_disk05_cell1 successfully created
CellDisk CD_disk06_cell1 successfully created
CellDisk CD_disk07_cell1 successfully created
CellDisk CD_disk08_cell1 successfully created
CellDisk CD_disk09_cell1 successfully created
CellDisk CD_disk10_cell1 successfully created
CellDisk CD_disk11_cell1 successfully created
CellDisk CD_disk12_cell1 successfully created

CellCLI> create flashcache all  size=2G
Flash cache cell1_FLASHCACHE successfully created

CellCLI> create flashlog all
Flash log cell1_FLASHLOG successfully created

CellCLI> list flashcache detail
	 name:              	 cell1_FLASHCACHE
	 cellDisk:          	 FD_00_cell1,FD_03_cell1,FD_02_cell1,FD_01_cell1
	 creationTime:      	 2013-05-16T17:11:57+08:00
	 degradedCelldisks: 	 
	 effectiveCacheSize:	 2G
	 id:                	 33020341-ba55-4b35-9b3a-4030b5085475
	 size:              	 2G
	 status:            	 normal

CellCLI> list flashlog detail
	 name:              	 cell1_FLASHLOG
	 cellDisk:          	 FD_01_cell1,FD_03_cell1,FD_02_cell1,FD_00_cell1
	 creationTime:      	 2013-05-16T17:12:10+08:00
	 degradedCelldisks: 	 
	 effectiveSize:     	 512M
	 efficiency:        	 100.0
	 id:                	 f10e1ac7-5e3f-4c1e-8f3b-8e9ab19fffeb
	 size:              	 512M
	 status:            	 normal

CellCLI> list cell
	 cell1	 online

CellCLI> list celldisk
	 CD_disk01_cell1	 normal
	 CD_disk02_cell1	 normal
	 CD_disk03_cell1	 normal
	 CD_disk04_cell1	 normal
	 CD_disk05_cell1	 normal
	 CD_disk06_cell1	 normal
	 CD_disk07_cell1	 normal
	 CD_disk08_cell1	 normal
	 CD_disk09_cell1	 normal
	 CD_disk10_cell1	 normal
	 CD_disk11_cell1	 normal
	 CD_disk12_cell1	 normal
	 FD_00_cell1    	 normal
	 FD_01_cell1    	 normal
	 FD_02_cell1    	 normal
	 FD_03_cell1    	 normal

CellCLI> list griddisk
	 data_CD_disk01_cell1	 active
	 data_CD_disk02_cell1	 active
	 data_CD_disk03_cell1	 active
	 data_CD_disk04_cell1	 active
	 data_CD_disk05_cell1	 active
	 data_CD_disk06_cell1	 active
	 data_CD_disk07_cell1	 active
	 data_CD_disk08_cell1	 active
	 data_CD_disk09_cell1	 active
	 data_CD_disk10_cell1	 active
	 data_CD_disk11_cell1	 active
	 data_CD_disk12_cell1	 active
CellCLI> list celldisk CD_disk01_cell1 detail
	 name:              	 CD_disk01_cell1
	 comment:           	 
	 creationTime:      	 2013-05-16T16:40:29+08:00
	 deviceName:        	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01
	 devicePartition:   	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01
	 diskType:          	 HardDisk
	 errorCount:        	 0
	 freeSpace:         	 0
	 id:                	 ecc913eb-5f74-4ad6-9d05-f811af986921
	 interleaving:      	 none
	 lun:               	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01
	 physicalDisk:      	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01
	 raidLevel:         	 "RAID 0"
	 size:              	 992M
	 status:            	 normal

CellCLI> list celldisk FD_00_cell1 detail
	 name:              	 FD_00_cell1
	 comment:           	 
	 creationTime:      	 2013-05-16T16:40:25+08:00
	 deviceName:        	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/FLASH01
	 devicePartition:   	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/FLASH01
	 diskType:          	 FlashDisk
	 errorCount:        	 0
	 freeSpace:         	 304M
	 freeSpaceMap:      	 offset=688M,size=304M
	 id:                	 c9488ae4-d3b9-4aa2-a4e5-d3539e44b417
	 interleaving:      	 none
	 lun:               	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/FLASH01
	 physicalDisk:      	 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/FLASH01
	 raidLevel:         	 "RAID 0"
	 size:              	 992M
	 status:            	 normal

CellCLI> list griddisk data_CD_disk01_cell1 detail
	 name:              	 data_CD_disk01_cell1
	 availableTo:       	 
	 cachingPolicy:     	 default
	 cellDisk:          	 CD_disk01_cell1
	 comment:           	 
	 creationTime:      	 2013-05-16T16:49:51+08:00
	 diskType:          	 HardDisk
	 errorCount:        	 0
	 id:                	 7a36bc8a-1611-474d-85fc-fa730e73176d
	 offset:            	 48M
	 size:              	 944M
	 status:            	 active

 

至此cell节点虚拟机基本创建完毕。

 

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号