关于RAC interconnect之bond

原文链接:http://www.dbaleet.org/about-rac-interconnect-os-bonding/

 

对于维护mission critical的高可用系统的DBA而言,SPOF(单点故障)永远是无法回避的话题,因为它会影响到SLA (服务等级协议), 对于任何一个设计良好的系统都应该尽量避免出现单点故障。

RAC的私网,我们有时称其为心跳线,可见其迁一发而动全身的作用。上一篇我们讲述了Oracle自带的一种防止私网单点故障的方式HAIP。 限于其局限性,本篇主要讲用途更为广泛的操作系统级别的网卡绑定。

绑定,简单的说就是使用多块物理网卡捆绑成逻辑上的一块网卡,它对于操作系统之上的应用程序是完全透明的,以提供故障转移或者负载均衡,消除单点故 障的隐患。 这看上去是一件非常简单的工作。但实际上远远不是像它看上去的,其中涉及到很多有关的网络方面的概念和细节,当然这些细节超出了本文的范畴,在此不做详细 的论述。

绑定(bond)是Linux平台的叫法,它还有其它的各种名字例如port trunking(端口汇聚,常见于交换机领域),Link aggregation(链路聚合, 事实上,这个才是业界的标准IEEE 802.3称谓, LACP——link aggregation control protocol),  NIC teaming (网卡组合, 常见于HP和Microsoft的称法),Ethernet Channel(以太网通道, Cisco/IBM的叫法)。当然这里面会存在一些细微的差异,例如绑定本身与交换机的概念无关,整个过程不需要交换机主 动参与,而链路聚合则需要交换机厂商本身的聚合协议支持。例如并不是所有交换机都支持IEEE 802.3ad(现在主流厂商基本都支持)。至于支持PAGP,则是Cisco的私有协议,国内厂商huawe和ZTE都有其对应的协议,这里就不一一赘 述了。

各主流的操作系统平台都提供各自的绑定方法。IBM AIX为Ethernet Channel, HP-UX上为APA(顺便提一下,这个功能是收费的。), Linux为bond, Solaris平台IPMP 。 这些绑定技术通常都可以配置为Active-Active, 或者是Active-Standby的方式。 需要注意的是Active-Active并不意味着能提供双向负载均衡的能力。以IBM AIX上的Ethernet Channel(此Ethernet Channel并不依赖于Cisco的交换机的Ethernet Channel协议)为例, 对于负载均衡只是针对操作系统所管辖的网卡outgoing(发送)的流量而言,而对于网卡Incoming(接收)的流量则是取决于交换机端配置使用的 算法。在绝大多数情况下,outgoing可以负载均衡,但incoming使用的总是同一个设备,也就是说是单向负载均衡。(HAIP可以提供双向负载 均衡)

对于interconnect,除Linux以外,其它Unix厂商绑定的配置和选项相对比较简单,私网配置错误的可能性较小。Linux平台由于 其复杂性和模式过于复杂需要在这里单独罗列出来。Linux的绑定模式一共分为7种:Mode 0: Round Robin,Mode 1: Active-Standby, Mode 2: XOR,Mode 3: Broadcast, Mode 4: LACP,Mode 5: Transmit Load Balancing, Mode 6: Adaptive Load Balancing。(在Linuxfoundation上提供了一篇关于这些模式的详细说明)其中私网心跳是不支持使用mode 0——round robin算法的。在MOS文档中提到应该尽量避免使用Mode 3和Mode 6.具体的文档如下:

Potential Disadvantages Using Broadcast Mode 3 For Bonded Devices (Doc ID: 730796.1)

Linux: ARP cache issues with Red Hat “balance-alb (mode 6)” bonding driver (Doc ID: 756259.1)

事实上,真正可供选择是只有mode 1和mode 4, 也就是Active-Standby和LACP。Active-Standby为Oracle推荐采用的方式。因为它稳定,可靠并且与交换机厂商无关,唯 一的“不足”是只能提供failover却无法提供load balance。mode 4也是可以采用的,但是前提是你的交换机需要支持802.3ad。在我的印象中,Oracle ACE director, Dell TechCenter的Kai Yu曾经在一个best practice的文档中推荐使用mode 4, 当然目前我无法找到此链接,如果找到我会尽快提供链接。(更新: 请点击这里,不过需要帆墙)

就我个人而言,建议不要使用mode 4, 理由如下:

1) 私网的load balance实际上意义不大。 假定一种前提就是私网的global cache的流量过大,需要使用负载均衡,试想这个时候其中一个网络回路出现问题,另外一个回路进行接管,那么原本在两个回路的数据量需要全部转移到一个 回路中,在切换的瞬间必然导致另外一个回路出现大量的拥塞,由于雪崩效应,虽然网络拓扑结构上并不存在单点,但是最终结果却与单点故障相差无几。
2) RHEL的早期版本4和5均存在一些bug,系统在负载均衡模式下,只要一个回路发生故障,系统会误以为所有回路都出现问题,当然通常是闪断,但是对于rac的心跳而言,这也是不可接受的。
3) 如果担心私网的带宽不足造成性能瓶颈, 基本的思路应该是尽量使用较大带宽的心跳网络,例如10Gbps的以太网或者infiniband,而不是使用多路1Gbps的以太网做负载均衡。

在Oracle的工程系统Exadata上,私网使用的是Active-Standby模式的infinband接口,一来提供了40Gbps的高 带宽,二来实现了故障转移。连接到生产的公网则使用1Gbps/10Gbps的以太网接口(默认是1Gbps, 如果要使用10Gbps,则需要单独购买光纤模块),也是使用Active-Standby的模式,当然公网绑定是允许用户修改为其它模式的。

 

附:如何配置对应的网卡绑定属于操作系统自身的范畴,但是你依然能在MOS上找到对应的配置文档,当然如果需要深入了解一些个性化的设置,则需要参考操作系统厂商提供的详细文档。以下列出MOS文档的标题和文档号供参考。

Configuring the HP-UX Operating System for the Oracle 10g and Oracle 11g VIP (Doc ID 296874.1)

Configuring the IBM AIX 5L Operating System for the Oracle 10g VIP (Doc ID 296856.1)

Linux Ethernet Bonding Driver (Doc ID 434375.1)

How to Analyze Problems When Trying to Configure Solaris IP Multipathing (Doc ID 1020085.1)

Infiniband可以运行哪些协议?

原文链接: http://www.dbaleet.org/about_protocols_of_infiniband/

 

这不是一篇介绍infiniband是什么的文章,而仅仅站在Oracle RAC和Exadata的角度上阐述infiniband。 如果您不知道infiniband是什么,请点击这里。

很多人可能不知道,绝大多数高性能计算机内部或者集群之间都是使用infiniband互联的。在国家超算中心,在亚马逊的云计算中心都有它的身 影。为什么infiniband会如此受欢迎呢?原因无非有两个: 一是目前infiniband本身能提供比传统以太网更高的带宽, 二是通常infiniband的开销比以太网要小,对于节点间通信大量数据传输比以太网效率要更高。当然Oracle也是这一技术的主导者, 其中RDS本身就是Oracle的一个开源项目。常见的运行在infiniband之上协议有哪些呢?下面就简单介绍一下Oracle DB可能会用到的几个:

IPoIB协议:
Internet Protocol over InfiniBand 简称IPoIB 。传统的TCP/IP栈的影响实在太大了,几乎所有的网络应用都是基于此开发的,IPoIB实际是infiniband为了兼容以太网不得不做的一种折 中,毕竟谁也不愿意使用不兼容大规模已有设备的产品。IPoIB基于TCP/IP协议,对于用户应用程序是透明的,并且可以提供更大的带宽, 也就是原先使用TCP/IP协议栈的应用不需要任何修改就能使用IPoIB。例如如果使用infiniband做RAC的私网,默认使用的就是 IPoIB。下图左侧是传统以太网tcp/ip协议栈的拓扑结构,右侧是infiniband使用IPoIB协议的拓扑结构。

RDS协议:

Reliable Datagram Sockets (RDS)实际是由Oracle公司研发的运行在infiniband之上,直接基于IPC的协议。之所以出现这么一种协议,根本的原因在于传统的 TCP/IP栈本身过于低效,对于高速互联开销太大,导致传输的效率太低。RDS相比IPoIB, CPU的消耗量减少了50%, 相比传统的UDP协议,网络延迟减少了一半。下图左侧是使用IPoIB协议的infiniband设备的拓扑图,右侧是使用RDS协议的 infiniband设备的拓扑结构。默认情况下,RDS协议不会被使用,需要进行额外的relink。另外即使relink RDS库以后,RAC节点间的CSS通信也是无法使用RDS协议的,节点间心跳维持以及监控总是使用IPoIB。下图左侧infiniband使用 IPoIB协议的拓扑结构,右侧是infiband使用RDS协议的拓扑结构。

SDP协议:

知道并且使用过RDS协议的人不少,但是可能不少人都没有听过sdp协议 。这个协议实际早在10g时代就存在过,只是没有专门的文档。这个白皮书算 是比较少见的。其中只是简要的提到了sdp,所著笔墨不多,也没有提到如何实现,可能在这个版本属于试验性的功能。文中提到依靠一个Oracle Application Server断的驱动,SDP协议可以与TCP/IP协议栈进行透明的转换。在11g正式讲这个功能列为new feature。Database端如何配置SDP连接可以点击这里: 11.1 11.2, Exalogic端如何配置SDP的链接可以在这里找到。甚至还有如何在java程序中使用SDP协议的案例介绍。 在实际应用中,多个Exadata机柜的相连可以通过配置SDP协议连接,Exalogic和Exadata的连接也是通过SDP‘协议的。但是需要注意 的是Oracle的Net Service目前是无法走RDS协议的。下图左侧是传统以太网tcp/ip协议栈的拓扑结构,右侧是infiniband使用SDP协议的拓扑结构。

还有可能会听过的协议有ZDP和IDB协议,这两个是新名词,如果有一点了解就知道是久瓶装新酒。iDB协议用于Exadata 数据库节点(DB node)和存储节点(cell node)之间的通信。i代表 intelligence, 言下之一就是智能数据库协议,您可不要小看它,整个Exadata的精髓offloading全靠它来完成,之所以其它第三方Oracle数据库一体机只 有Exadata的形而没有Exadata的神,原因就在此。简单的说它是由Oracle数据库内核来实现的,可以智能的将表扫描的工作放到存储一端去完 成,然后由存储进行过滤,最后只返回查询需要的数据。举个简单的例子: 比如某个表有1亿行,但是满足过滤条件的就只有1万行,数据库节点会发出一个指令告诉存储节点,“我需要查询某某表过滤条件是什么,你去处理一下,把结果 告诉我就成,我还有别的事情要忙”。这个指令就是iDB。iDB的实现是Oracle公司的最高机密,除了Exadata的核心研发团队和技术高管没有人 知道内部是如何实现的,只知道iDB协议是运行在ZDP协议(Zero-loss Zero-copy Datagram Protocol)之上,基于基于RDS协议的V3版本(OFED version 1.3.1))的标准进行研发的。Oracle的官方数据显示使用ZDP协议进行数据传输能达到每秒3GB/s,而仅仅消耗主机CPU资源的2%。

以上仅仅讲到Oracle相关的一些infiniband协议,最后上传一张图片讲囊括infiniband的Stack作为补充。

下一篇将介绍infiband在各平台的兼容性和认证情况。

以上

Exadata FAQ——为什么ASM rebalance hang在EST_MINUTES=0

原文链接: http://www.dbaleet.org/why_asm_rebanlance_hang_at_est_minutes_eq_zero/

最早故事大概发生在一年前,当时某个客户Exadata有几个盘坏了,需要更换。当时正好我在客户现场做一个变更,正好帮忙换一下硬盘,因为Exadata换盘的步骤比较繁琐,客户也是第一次遇到这样的事情,所以也格外谨慎。变更是在凌晨,此时业务量非常小,所以索性将ASM_POWER_LIMIT开足马力调整到11。期望rebalance能快点结束。还好一切顺利,中间并没有遇到什么差错。最后一部将ASM磁盘加回到asm diskgroup也很顺利,然后不停的在刷着select * from v$asm_opearation; /之类的。两小时后,眼看EST_MINUTES就马上接近于零了,换盘工作也即将结束。于是乎就去找客户闲聊,拉拉家常。半个小时过去了,我回到座位,熟练的敲了一下/,口里还念叨了一句:no row selected大大出乎我的意料的是竟然还有记录。越想越不对,10g的ASM也算换过不少次了,从来没出现像现在这样的。难道这个参数不准?下意识的去存储节点看了下iostat的结果,发现I/O量还是很大的。这个时候已经是凌晨3点了,不应该有这么大的访问量才对呀。我又简单的看了下db的负载一切正常,这就奇怪了。干脆一不做,二不休等吧。查询v$asm_operation得到的结果基本是这样的:

SQL>select * from v$asm_operation;

GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE

------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------

           1 REBAL RUN          11         11       5518       5917       3250           0

一个小时过去了, 没变。

两个小时过去了,不见动静。

四个小时过去了,马上就天亮了。如果还不完成的话,那么就立即到营业时间了,眼看就快过了维护窗口了,如果不能完成可能就影响到业务了。我不停的刷着/, 期望rebalace能尽快结束,终于在四个半小时小时以后出现了久违的no row selected。当时我就想肯定是这个EST_MINUTES估算值不准导致的。因为10g时代已经习惯了v$session_longops不准了。但是令人十分费解的是加几个盘也用不了这么久吧?正常情况下两个小时就结束了,Exadata号称性能最强的数据库一体机,连普通的PC server都不如?

一个月以后,同样的事情有一次碰到,但是我不在客户现场了。这是国内某大型的金融客户。客户告诉我,他们加盘的动作是设定某个时间段进行,预估的时间是根据个EST_MINUTES算出来,然后多加一个小时,在10g时代,客户一直是这么做的。结果竟然2-3小时还没完,影响到业务了。这个时候,这个问题我已经知道是为什么了,但是我并没有说明具体的原因,只是告诉他这个估算出来的值不准,并且加盘减盘最好不要设定死固定的窗口, ASM_POWER_LIMIT不要虽然调整到最大值,设置为4就行了,这样不会影响业务。

时隔不久,竟然又有同事遇到了同样的问题,但是这次不是在exadata上,只是普通的11.2.0.2的数据。

实际上:EST_MINUTES 是按照以下公式计算的:

EST_MINUTES = (EST_WORK-SOFAR)/ EST_RATE

客户这个例子EST_MINUTES=(5917-5518)/3250=0.12m 约等于0, 证明rebalance已经“结束”。但为什么select * from v$asm_operation中还显示有记录呢,并且时间都是非常的长。

那这两者会有什么不同,这个时候,ASM正在做什么呢?我们猜测在EST_MINUTES=0, 并且select * from v$asm_operation的时候,ASM一定在后台进行某种秘密的活动。因为最终的rebalance是由ARB0完成的,所以我们想通过对ARB0进程在这两个阶段分别进行debug,然后对比其异同:

首先在EST_MINUTES不为0的时候,ARB0的堆栈如下:

kfk_reap_oss_async_io <-kfk_reap_ios_from_subsys<-kfk_reap_ios<-kfk_io1<-kfkRequest<-kfk_transitIO<-kffRelocateWait<-kffRelocate<-kfdaExecute <-kfgbRebalExecute<-kfgbDriver<-ksbabs<-kfgbRun<-ksbrdp<-opirip<-opidrv <-sou2o<-opimai_real<-ssthrdmain <-main

从上面的堆栈函数,我们可以猜测到此时ARB0进程一定是在做段的分配,并且等待段的分配的完成。

当EST_MINUTES=0, 但是v$asm_operation视图还有值的时候,再ARB0进行debug:得到的堆栈信息明显就有不一样了:

kfk_reap_oss_async_io<-kfk_reap_ios_from_subsys<-kfk_reap_ios<-kfk_io1
<-kfkRequest<-kfk_transitIO<-kffRelocateWait<-kffRelocate<-kfdaExecute<-kfdCompact<-kfdExecute<-kfgbRebalExecute<-kfgbDriver<-ksbabs<-kfgbRun<-ksbrdp<-opirip<-opidrv<-sou2o<-opimai_real<-ssthrdmain<-main

可以看到其中有个函数的名字叫做kfdCompact, 所以我们猜测这个神秘的阶段ARB0进程是在做compact这个动作。从这个compact来看,这个动作显然是11.2ASM的一个未公开的新特性,一个对数据进行重组和优化的阶段。后来发现这个动作并不是每次rebalance的时候都会发生。这个动作所做的事情实际上是把数据尽量挪到外圈加快访问速度。这个过程并不是必须的,可以通过以下隐含参数禁用:_DISABLE_REBALANCE_COMPACT=TRUE,值得注意的是这个神奇的参数在11.2.0.3以下版本最好不要禁用,原因在于: Bug 10022980 – DISK NOT EXPELLED WHEN COMPACT DISABLED, 这个bug在11.2.0.3修复。当然还有一种方式就是隐含参数_REBALANCE_COMPACT设置为false。

我的建议是,如果对于lun, 数据本身已经打散,ASM根本不知道磁盘的最外圈在什么地方,所以这种情况下,应该将这个compact这个过程禁用,以免耽误很长的时候,而结果却适得其反。如果ASM盘是裸盘,则不要关闭这个特性。在Exadata上,同样不要禁用这个特性。当然同时,请不要将轻易将ASM_POWER_LIMIT设置为最大值,然后进行rebalance, 一种思路是将ASM_POWER_LIMIT调整到4左右,然后添加/删除/替换磁盘,让其在后台进行,然后写一个脚本每隔几分钟查询一次v$asm_operation,如果返回空行,则表示rebalance已经成功,然后想dba team发送邮件或者短信通知。

最后需要补充一句的事情是:这个问题已经被oracle当作一个bug处理,Bug 9311185: EST_MINUTES IN V$ASM_OPERATION MAY SHOW ZERO FOR EXTENDED PERIODS, 也就是没有办法监控到compact的完成度,这个由于已有的代码问题,在11.2中几乎无法修复。12c中确认已经修复。

以上

 

How to troubleshoot ‘ASM does not discover disks’

原文链接:http://www.dbaleet.org/how_to_troubleshoot_asm_does_not_discover_disks/

 

This scenario is  far from new in a conventional Oracle database environment, there are a couple of checklist need to be done if you came across this issue.

 

1. Check the ownership and permission of the asm candidate disks, it should be owned by the RDBMS software owner, eg: oracle:dba, and the permission of the candidate disks should be 660. You can check this by “ls -ltr” command on most cases.

 

2. If both ownership and permission are correct, you might have to read the disk manually by OS command “dd” under user “oracle”. Eg, if the name of the LUN to be used by asm is “/dev/asm/ocr1″ , you can read this disk by:

#su - oracle
$dd if=/dev/asm/ocr1 of=/dev/null bs=8192 count=10

If the output of the above command returns  something like “xxx in, xxx out” , then it is most likely that not be a problem of disk itself.

 

3. If you are using a multi-path technology, do not forget checking the certification information before making a plan. The certification has been documented well in MOS Oracle ASM and Multi-Pathing Technologies [ID 294869.1] . Be ware that IBM VPath is not supported on ASM, you should use alternative solution MPIO instead.

 

4. If you are using ‘ASMlib’, Firstly make sure that asmlib has been properly reconfigured, asmlib relied on specific Linux kernel versions, a mismatch between asmlib   and linux kernel   will lead a asmlib installation failure.  Secondly, please try to run

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

If there is no disk be found, do not rush into building ASM instance and ASM Diskgroup,  investigate the reason behind first would save your time.

 

5. Please also make sure that your asm_diskstring parameter is  properly set, ASM will only find the devices under the path which asm_diskstring provided with.

 

6.  Last but not least, kfod is a friend you can count on.

 

$export LD_LIBRARY_PATH=/tmp/OraInstall2013-09-12_06-25-45PM/ext/lib
$cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin
$./kfod op=disks disks=all

if the above command returns nothing, try to trace this process:

$strace -f ./kfod op=disks disks=all

and investigate further by the output, it should cover all of the detail which is helpful for diagnostic this issue.

 

ON EXADATA:

Normaly, you can skip all of the 5 steps above, just take the 6th step should be enough if you griddisks are all online.

there are some dummy traces environment variables need to be set  before trace the kfod on some rare occasions.

$export CELLCLIENT_TRACE_LEVEL="all,4"
$export CELLCLIENT_AUTOFLUSH_LEVEL="all,4"
$xport CELLCLIENT_TRACE_INFO="autoflush_sync,on"
$cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin
$./kfod op=disks disks=all
$strace -f ./kfod op=disks disks=all

 

We recently found that ASM instance and cellsrv should not be on the same node, otherwise, ASM instance won’t find any disks if cellsrv on the same node  is already up.

It seems ASM instance tend to be searching for a library called “libcell11.so”, if there is a cell version of this file and it is now up,  ASM instance would stop discovering the griddisks.

 

 

Juan Mosqueda contributes for the Exadata part of this article.

Thanks you Juan.

infiniband协议在Oracle RAC的兼容性

原文链接: http://www.dbaleet.org/infiniband_protocol_oracle-rac_compatible/

上篇介绍了infiniband的的各种协议,以及他们是如何与Oracle数据库产品集成的,但是还有一个非常重要的信息没有提到,那就是在个各平台和数据库版本的兼容性。

在Oracle官方网站上有两个链接专门介绍RAC技术在Unix平台和Linux平台的兼容性。其中截取infiniband的情况如下:

Oracle RAC Technologies Certification Matrix for UNIX Platforms 

InfiniBand (IB)

  • RDS over IB is supported (see notes)
  • IP over IB is supported
IBM POWER system with AIX 5.3 TL8 Service Pack 4 or AIX 6.1 TL4 with Service Pack 1 and Oracle RAC 11.1. Customers planning deployment must review MetaLink 282036.1 for details on supported software versions.HPUX 11iv3 (B.11.31.0909) and OS Patches: DLPI patch PHNE_38689, ARPA Transport patch PHNE_38680, APA/LM web release of Dec 2008 OR the 0903 Fusion release.  Oracle patch for skgxp – bug # 8618175 to be installed on top of Oracle RAC 11.1.0.7.  Download and install the HPUX Infiniband driver: IB4X-00 Driver for InfiniBandRDSv1 is supported on Solaris SPARC and x86-64 with Oracle version 10.2.0.4 and Solaris 10 update 5/09 or higherRDSv1 is supported on Solaris SPARC  with Oracle version 11.1.0.7 on Solaris 10 10/08 and above. The patch for fixing Oracle Bug 9788507 is required.RDSv3 is supported on Solaris 11 SPARC and x86-64  with Oracle 11gR2 (11.2.0.3)  on Solaris 11 SRU5 and later.

RAC Technologies Matrix for Linux Platforms

InfiniBand (IB)

  • RDS over IB is supported
  • IP over IB is supported
  • Reliable Datagram Sockets (RDS) is supported with QLogic (SilverStorm) switches on x86 and x86-64 with Oracle 10.2.0.3
  • Open Fabrics Enterprise Distribution (OFED) 1.3.1 and higher (RDS v2 and higher) is supported with Oracle 11.1 and higher with Oracle/Sun, HP, QLogic and Voltaire switches
  • Oracle only supports InfiniBand HCA with Mellanox chip set

 

简单的总结如下:

AIX平台和HP-UX目前只能使用11.1.0.7这个版本才能使用infiniband的RDS协议(IPoIB不受这个限制):

AIX平台:  综合IBM官方网站信息

  • AIX 5.3 TL8 SP6+AIX 5.3 TL11 SP1+  (Oracle database 11.1.0.7)
  • AIX 6.1 TL4 SP1+APAR IZ64144     (Oracle database 11.1.0.7)
  • AIX 7.1 暂不支持。
对于11.2, 目前还存在一个bug没有修复,详见MOS文档Minimum Software Versions and Patches Required to Support Oracle Products on IBM Power Systems (Doc ID 282036.1)的附件PDF,其中有一段话摘录如下:
In general IP over InfiniBand for the RAC cluster interconnect is supported as a Generic Certification (see section 1.3.0), 10gR2 and
11gR1 are supported. However, it is not recommended to use InfiniBand with Oracle RAC 11gR2 for the cluster interconnect at this time. On AIX, high availability for InfiniBand interfaces is provided using the Virtual IP Address (VIPA) feature. Oracle RAC 11gR2 introduced functionality which is incompatible with VIPA and in some cases also incompatible with the InfiniBand interface. IBM and Oracle are working to resolve these incompatibilities in the future. Customers requiring a higher bandwidth interconnect should consider 10GbEthernet, which is Generically Certified with 10gR2, 11gR1 and 11gR2. IT architects requiring more details on these incompatibilities, and the status of enhancements, should contact ibmoracl@us.ibm.com.

 

HP-UX平台:
  • HP-UX 11.31.0909+PHNE_38689+PHNE_38680 (Oracle Database 11.1.0.7+patch 8599853)
HP-UX暂无法在11.2上使用RDS协议。
Oracle Solaris平台:

10.2.0.4 and Solaris 10 update 5/09 or higher

  • Solaris 10  5/09 + (Oracle Database 10.2.0.4)
  • Solaris 10 10/08+ (Oracle Database 11.1.0.7+patch 9788507)
  • Solaris 11 SRU5+ (Oracle Database11.2.0.3)

Linux X86平台:

  • Linux X86 + (QLogic交换机+Oracle Database 10.2.0.3+)
  • Linux X86+ OFED 1.3.1 (Oracle/Sun, HP, QLogic and Voltaire交换机+Oracle Database 11.1+ )
另外值得一提的是,Oracle目前只认证QLogic公司的infiniband交换机。但是大部分infiniband厂商都提供了针对Oracle RAC的解决方案:以下是一些白皮书:
 最后简单发表一下个人看法:
如果要使用infiniband,一定需要确保对应的软硬件平台能支持RDS协议。如果不支持,则还不如使用10GBps的以太网。对于infiniband协议支持最好的两个平台是Linux和Solaris,这也是Exadata的操作系统只有这两者的一个原因,完美支持infiniband各种协议的只有Linux平台和Solaris平台。
目前在AIX和HP-UX平台,能使用RDS协议的Oracle数据库只有11.1.0.7 这个版本,更高版本目前并不是完美兼容,虽然可以使用一些workaround绕过去,单终究不是太好的选择。
Tips:
在打Exadata的BP(Bundle Patch)的时候,尽量不要使用apply或者napply的方式手工去做,很多高级dba往往觉得auto模式不靠谱(事实也是如此),情愿选择手工的模式,但是觉得自己对此比较熟悉,往往不去看readme,结果导致下面步骤漏掉,后果可想而知。。。
dcli -l oracle -g /home/oracle/dbs_group ORACLE_HOME=/u01/app/11.2.0.3/grid make -C /u01/app/11.2.0.3/grid/rdbms/lib -f ins_rdbms.mk ipc_rds ioracle

 

如何应用Exadata Bundle Patch

原文链接:http://www.dbaleet.org/how_to_apply_bundle_patch_on_exadata/

 

在Exadata中,数据库应用的补丁被称为Bundle Patch,简称BP (Windows上Oracle的补丁也是以BP形式发布的)。这一类型的补丁实际上是标准数据库的补丁,不包括任何Exadata特有的代码,也就是说它可以应用在Linuxx86_64平台的Oracle数据库之上。BP与标准Oracle数据库的PSU(patchset update)类似。例如BP也是累计的,最新的BP包含了之前BP的全部内容,但是同时又有其特殊性。

首先BP发布的周期较短,通常是一个月发布一次。这是因为通常情况下,Exadata并不建议打单独的one-off patch,主要是考虑到十分复杂的补丁冲突分析,以及由此带给支持后台的大量的补丁合并请求(Patch Merge Request)。

 

同时由于其发布的周期较短,必然会带来另外一个问题,那就是BP补丁测试可能不如PSU那么充分,甚至可能会出现有严重的问题而召回重新发布的情况发生。Exadata上决定放入BP补丁集中的补丁决定时间比较早,也就是说通常在正式发布大半个月之前补丁列表就已经冻结了,如果此后又发现新的问题,则会留到下一个版本去解决。

 

Oracle通常建议Exadata用户应用一种被称为QFSDP(Quarterly Full Stack Download Patch )的补丁,QFSDP每一个季度发布一次,属于大而全的补丁集,不仅仅包括BP,QFSDP中包括了所有Exadata需要的补丁:

 

以下以BP16为例,介绍对GI和RDBMS应用BP的流程:

下载对应的BP和Opatch工具并将其传到db01节点上
其中BP16的补丁号为16233552,opatch的补丁号为6880880。

1. 将opatch解压到$ORACLE_HOME下,将BP16解压到/tmp下:

unzip p6880880_112000_Linux-x86-64.zip -d /u01/app/11.2.0.3/grid

unzip p6880880_112000_Linux-x86-64.zip –d /u01/app/oracle/product/11.2.0.2/dbhome_1

mkdir /tmp/bp16

unzip p16233552_11203_Linux-x86-64.zip -d /tmp/bp16

 

2. 配置OCM

$ORACLE_HOME/OPatch/ocm/bin/emocmrsp

 

3. 分别使用grid和oracle用户检查一下OPatch 的版本是否正确:

$ORACLE_HOME/OPatch/opatch version

 

4. 分别使用grid和oracle用户检查一下当前patch 信息:

$ORACLE_HOME/OPatch/opatch lsinventory -detail -oh $ORACLE_HOME

 

5. 使用grid用户对检查GI补丁的兼容性:

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /tmp/bp16/16233552/16233552

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /tmp/bp16/16233552/16355082

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /tmp/bp16/16233552/16401300

 

6. 使用oracle用户对于RDBMS检查补丁的兼容性:

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /tmp/bp16/16233552/16233552

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /tmp/bp16/16233552/16355082/custom/server/16355082

 

7. 使用grid用户检查GI补丁的空间需求:

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseDir /tmp/bp16/16233552/16233552

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseDir /tmp/bp16/16233552/16355082

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseDir /tmp/bp16/16233552/16401300

 

8. 使用oracle用户检查RDBMS补丁的空间需求:

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseDir /tmp/bp16/16233552/16233552

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseDir /tmp/bp16/16233552/16355082/custom/server/16355082

 

9. 使用root用户采用opatch auto的方式应用BP:

#ORACLE_HOME/OPatch/opatch auto /tmp/bp16/16233552

 

10. 分别使用grid用户和oracle用户检查patch是否应用成功:

$ORACLE_HOME/OPatch/opatch lsinventory -detail -oh $ORACLE_HOME

 

11. 使用oracle用户在任意一个节点执行升级数据字典的脚本:

$sqlpllus "/as sysdba"
SQL> @rdbms/admin/catbundle.sql exa apply
SQL> quit

12. 查询组件信息确认数据字典已经升级:

SQL>select substr(comp_name,1,40) comp_name, status, substr (version,1,10) version from dba_registry order by comp_name;

如何启用Exadata Cell端的SELinux Enforcing模式

原文链接: http://www.dbaleet.org/how_to_enable_selinux_on_exadata_cell/

SELinux(Security-Enhanced Linux) 是美国国家安全局(NSA)为Linux设计的,对于强制访问控制的一个安全子系统。大多数Linux的发行版都在各自的内核级别启用 SELinux 的,同时提供一个可定制的安全策略。SELinux实际上是强制访问控制的一种实现,主要目的是为了控制进程可以访问的资源,能够减少或者防止0-day漏洞的探测和攻击。

在Exadata上,DB节点默认其SELInux的策略是禁止的,主要是考虑到ASM可能造成的问题:参见我之前的文章 iptables和SELinux是不是必须禁用? http://www.dbaleet.org/is_disable_iptables_and_selinux_to_be_mandatory/ 。

SELinux有三种状态:

  • Enforcing: 这个缺省模式会在系统上启用并实施 SELinux 的安全性政策,拒绝访问及记录行动
  • Permissive: 在 Permissive 模式下,SELinux 会被启用但不会实施安全性政策,而只会发出警告及记录行动。Permissive 模式在排除 SELinux 的问题时很有用。
  • Disabled: SELinux 已被禁用。

在Cell端,SELInux默认以Permissive的方式开启,也就意味着系统只默认将违背访问控制的的内容写入到日志文件,并不真正的实行安全性策略。

[root@dm01cel01 audit]# imageinfo

Kernel version: 2.6.32-400.6.2.el5uek #1 SMP Sun Nov 18 17:02:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_121203
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_121203-1

Active image version: 11.2.3.2.1.121203
Active image activated: 2012-12-05 18:22:16 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback: Impossible

Cell boot usb partition: /dev/sda1
Cell boot usb version: 11.2.3.2.1.121203

Inactive image version: undefined
Rollback to the inactive partitions: Impossible
[root@dm01db01 audit]# sestatus
SELinux status: enabled
SELinuxfs mount: /selinux
Current mode: permissive
Mode from config file: permissive
Policy version: 24
Policy from config file: targeted

在Linux下,SELinux是通过auditd守护进程将其违反的AVC策略写入到/var/log/audit/audit.log中的,这个文件默认日志会在25MB大小的时候进行rotation ,生成后缀为.1或者.2之类的归档文件。

Exadata较新的版本中,在/opt/oracle.SupportTools目录下会有一个名为SELinuxPermit.log的文件,这个文件实际上是取自/var/log/audit/audit.log文件中的avc denial message,;例如:

type=AVC msg=audit(1330990260.465:60): avc: denied { execute } for pid=8232 comm="ntpdate" path="/lib64/libcap.so.1.10" dev=md5 ino=688359 scontext=system_u:system_r:ntpd_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=file

这些就是违背SELinux的规则,如果将 SELinuxPermit.log添加为例外的规则,那么就是将SELinux设置为enforcing的模式了。

1. 根据SELinuxPermit.log的违反规则,生成对应的te(type enforcement)文件:

[root@dm01cel01 audit]# sed -e '/^#/d' /opt/oracle.SupportTools/SELinuxPermit.log |audit2allow -m Exadata > Exadata.te

2. 编译成对应的模块:

[root@dm01cel01 audit]# checkmodule -M -m Exadata.te  -o Exadata.mod

3. 将模块生成对应的package:

[root@dm01cel01 audit]# semodule_package -m Exadata.mod -o Exadata.pp

4. 在内核中移除以前的Exadata模块:

[root@dm01cel01 audit]# semodule ­-r Exadata

5. 将新生成的package载入内核:

[root@dm01cel01 audit]# semodule -­i Exadata.pp

6. 将/etc/selinux/config文件中的 “selinux=修改为enforcing”。

7. 重启主机生效。

 

注意,当前SELinux enforcing模式没有经过严格的测试,并不被官方支持。如果对Linux的SELinux不是太熟悉,请不要进行设置。

参见MOS文档:How to enable ‘enforcing’ mode for SELinux on Exadata (Doc ID 1481829.1)

Please note that SELinux enforcing mode is not tested by Exadata Development and
that creating/changing an SELinux policy is beyond the scope of Oracle Support.
Setting SELinux policy must be done carefully, to avoid ‘breaking’ applications,
so be sure to check /var/log/audit/audit.log if something should ‘go wrong’.

附我当前测试环境下Exadata其对应的TE文件的内容:

module Exadata 1.0;

require {
type audisp_t;
type mount_t;
type file_t;
type restorecon_t;
type load_policy_t;
type procmail_t;
type mdadm_t;
type wtmp_t;
type snmpd_t;
type tmp_t;
type root_t;
type auditctl_t;
type fsdaemon_t;
type auditd_t;
type faillog_t;
type fsadm_t;
type iptables_t;
type hwclock_t;
type mqueue_spool_t;
type pam_console_t;
type system_mail_t;
type semanage_t;
type usr_t;
type ping_t;
type syslogd_t;
type sysfs_t;
type var_spool_t;
type irqbalance_t;
type var_log_t;
type sendmail_log_t;
type setfiles_t;
type lastlog_t;
type etc_mail_t;
type shadow_t;
type ifconfig_t;
type ntpd_t;
type locale_t;
type etc_runtime_t;
type klogd_t;
type device_t;
type initrc_var_run_t;
type var_t;
type netutils_t;
class process { setsched getsched };
class capability sys_resource;
class file { rename execute setattr read getattr write ioctl unlink append };
class netlink_route_socket { write bind create read nlmsg_read };
class lnk_file read;
class dir { rename search read write getattr rmdir remove_name };
}

#============= audisp_t ==============
allow audisp_t file_t:file { execute getattr };
allow audisp_t self:capability sys_resource;

#============= auditctl_t ==============
allow auditctl_t etc_runtime_t:file getattr;
allow auditctl_t faillog_t:file getattr;
allow auditctl_t file_t:file read;
allow auditctl_t initrc_var_run_t:file getattr;
allow auditctl_t lastlog_t:file getattr;
allow auditctl_t locale_t:file getattr;
allow auditctl_t shadow_t:file getattr;
allow auditctl_t tmp_t:file read;
allow auditctl_t wtmp_t:file getattr;

#============= auditd_t ==============
allow auditd_t file_t:file { rename getattr setattr read unlink append };

#============= fsadm_t ==============
allow fsadm_t root_t:file unlink;
allow fsadm_t var_log_t:file append;

#============= fsdaemon_t ==============
allow fsdaemon_t file_t:file { read getattr };
allow fsdaemon_t self:capability sys_resource;
allow fsdaemon_t usr_t:file { read getattr };

#============= hwclock_t ==============
allow hwclock_t self:capability sys_resource;

#============= ifconfig_t ==============
allow ifconfig_t file_t:dir { search getattr };
allow ifconfig_t file_t:file append;
allow ifconfig_t file_t:lnk_file read;
allow ifconfig_t usr_t:lnk_file read;
allow ifconfig_t var_log_t:file write;

#============= iptables_t ==============
allow iptables_t file_t:dir { search getattr };
allow iptables_t file_t:file append;
allow iptables_t file_t:lnk_file read;

#============= irqbalance_t ==============
allow irqbalance_t file_t:file { read getattr execute };

#============= klogd_t ==============
allow klogd_t file_t:file { read getattr execute };

#============= load_policy_t ==============
allow load_policy_t file_t:file { read getattr execute };

#============= mdadm_t ==============
allow mdadm_t var_log_t:file append;

#============= mount_t ==============
allow mount_t var_log_t:file append;

#============= netutils_t ==============
allow netutils_t sysfs_t:dir search;
allow netutils_t sysfs_t:file read;

#============= ntpd_t ==============
allow ntpd_t file_t:file { read getattr unlink execute };

#============= pam_console_t ==============
allow pam_console_t file_t:file { read ioctl getattr };

#============= ping_t ==============
allow ping_t file_t:file { read getattr execute };

#============= procmail_t ==============
allow procmail_t file_t:file { read getattr };
allow procmail_t self:capability sys_resource;

#============= restorecon_t ==============
allow restorecon_t file_t:file execute;

#============= semanage_t ==============
allow semanage_t file_t:dir { rename write getattr rmdir read remove_name };
allow semanage_t file_t:file { execute unlink };
allow semanage_t file_t:lnk_file read;

#============= setfiles_t ==============
allow setfiles_t device_t:file append;
allow setfiles_t file_t:file execute;

#============= snmpd_t ==============
allow snmpd_t etc_mail_t:dir search;
allow snmpd_t etc_mail_t:file { read getattr };
allow snmpd_t file_t:file { read rename getattr unlink execute };
allow snmpd_t mqueue_spool_t:dir search;
allow snmpd_t self:capability sys_resource;
allow snmpd_t self:netlink_route_socket { write bind create read nlmsg_read };
allow snmpd_t self:process { setsched getsched };
allow snmpd_t sendmail_log_t:dir search;
allow snmpd_t sendmail_log_t:file read;
allow snmpd_t tmp_t:dir { read getattr };
allow snmpd_t usr_t:file append;
allow snmpd_t var_spool_t:dir search;
allow snmpd_t var_t:lnk_file read;

#============= syslogd_t ==============
allow syslogd_t file_t:file { read getattr execute };

#============= system_mail_t ==============
allow system_mail_t file_t:file { read getattr execute };

Exadata性能监控的瑞士军刀——cellsrvstat

http://www.dbaleet.org/exadata_performance_monitoring_swiss_army_knife_cellsrvstat/

如果需要查找Exadata cell(存储节点)的offloading/smart scan/storage index的信息,通常我们可以在数据库端通过过滤查找v$sql, v$sysstat之类的动态性能视图得到,有没有更简单的方法呢?

从某一个版本开始在每个Exadata存储节点中加入了一个叫做cellsrvstat的小工具,这个工具收集针对当前cell节点进行收集,并且收集的信息非常全面,堪称Exadata上的“上古神器”。

[root@slca04cel01 ~]# cellsrvstat
===Current Time=== Fri Aug 23 08:12:19 2013

== Input/Output related stats ==
Number of hard disk block IO read requests 0 1823855
Number of hard disk block IO write requests 0 849658
Hard disk block IO reads (KB) 0 1390317
Hard disk block IO writes (KB) 0 424990
Number of flash disk block IO read requests 0 0
Number of flash disk block IO write requests 0 0
Flash disk block IO reads (KB) 0 0
Flash disk block IO writes (KB) 0 0
Number of disk IO errors 0 0
Number of reads from flash cache 0 0
Number of writes to flash cache 0 0
Flash cache reads (KB) 0 0
Flash cache writes (KB) 0 0
Number of flash cache IO errors 0 0
Size of eviction from flash cache (KB) 0 0
Number of outstanding large flash IOs 0 0
Number of latency threshold warnings during job 0 33
Number of latency threshold warnings by checker 0 0
Number of latency threshold warnings for smart IO 0 0
Number of latency threshold warnings for redo log writes 0 0
Current read block IO to be issued (KB) 0 0
Total read block IO to be issued (KB) 0 1446974
Current write block IO to be issued (KB) 0 0
Total write block IO to be issued (KB) 0 424990
Current read blocks in IO (KB) 0 0
Total read block IO issued (KB) 0 1446974
Current write blocks in IO (KB) 0 0
Total write block IO issued (KB) 0 424990
Current read block IO in network send (KB) 0 0
Total read block IO in network send (KB) 0 1446974
Current write block IO in network send (KB) 0 0
Total write block IO in network send (KB) 0 424990
Current block IO being populated in flash (KB) 0 0
Total block IO KB populated in flash (KB) 0 0

== Memory related stats ==
SGA heap used - kgh statistics (KB) 0 438098
SGA heap free - cellsrv statistics (KB) 0 20655
OS memory allocated to SGA (KB) 0 458754
SGA heap used - cellsrv statistics - KB 0 438099
OS memory allocated to PGA (KB) 0 898
PGA heap used - cellsrv statistics (KB) 0 376
OS memory allocated to cellsrv (KB) 0 5754818
Top 5 SGA consumers (KB)
storidx::arraySeqRIDX 0 88719
SUBHEAP Networ 0 81937
storidx:arrayRIDX 0 73816
Thread IO Lat Stats 0 35158
RemoteSendPort Fixed Size 0 33935
Top 5 SGA subheap consumers (KB)
Network mem 0 81925
Network heap chunk 0 2462
Number of allocation failures in 512 bytes pool 0 0
Number of allocation failures in 2KB pool 0 0
Number of allocation failures in 4KB pool 0 0
Number of allocation failures in 8KB pool 0 0
Number of allocation failures in 16KB pool 0 0
Number of allocation failures in 32KB pool 0 0
Number of allocation failures in 64KB pool 0 0
Number of allocation failures in 1MB pool 0 0
Allocation hwm in 512 bytes pool 0 620
Allocation hwm in 2KB pool 0 602
Allocation hwm in 4KB pool 0 620
Allocation hwm in 8KB pool 0 1002
Allocation hwm in 16KB pool 0 602
Allocation hwm in 32KB pool 0 601
Allocation hwm in 64KB pool 0 601
Allocation hwm in 1MB pool 0 55
Number of low memory threshold failures 0 0
Number of no memory threshold failures 0 0
Dynamic buffer allocation requests 0 0
Dynamic buffer allocation failures 0 0
Dynamic buffer allocation failures due to low mem 0 0
Dynamic buffer allocated size (KB) 0 0
Dynamic buffer allocation hwm (KB) 0 0

== Execution related stats ==
Incarnation number 0 5
Number of module version failures 0 0
Number of threads working 0 1
Number of threads waiting for network 0 19
Number of threads waiting for resource 0 0
Number of threads waiting for a mutex 0 0
Number of Jobs executed for each job type
CacheGet 0 1838056
CachePut 0 849658
CloseDisk 0 711757
OpenDisk 0 712141
ProcessIoctl 0 14062328
PredicateDiskRead 0 0
PredicateDiskWrite 0 0
PredicateFilter 0 0
PredicateCacheGet 0 0
PredicateCachePut 0 0
FlashCacheMetadataWrite 0 0
RemoteListenerJob 0 0
FlashCacheResilveringTableUpdate 0 0
CellDiskMetadataPrepare 0 0

SQL ids consuming the most CPU
other 0000000000000 2
END SQL ids consuming the most CPU

== Network related stats ==
Total bytes received from the network 0 804684378
Total bytes transmitted to the network 0 7721296
Total bytes retransmitted to the network 0 0
Number of active sendports 0 7
Hwm of active sendports 0 15
Number of active remote open infos 0 6
HWM of remote open infos 0 65

== SmartIO related stats ==
Number of active smart IO sessions 0 0
High water mark of smart IO sessions 0 0
Number of completed smart IO sessions 0 0
Smart IO offload efficiency (percentage) 0 0
Size of IO avoided due to storage index (KB) 0 0
Current smart IO to be issued (KB) 0 0
Total smart IO to be issued (KB) 0 0
Current smart IO in IO (KB) 0 0
Total smart IO in IO (KB) 0 0
Current smart IO being cached in flash (KB) 0 0
Total smart IO being cached in flash (KB) 0 0
Current smart IO with IO completed (KB) 0 0
Total smart IO with IO completed (KB) 0 0
Current smart IO being filtered (KB) 0 0
Total smart IO being filtered (KB) 0 0
Current smart IO filtering completed (KB) 0 0
Total smart IO filtering completed (KB) 0 0
Current smart IO filtered size (KB) 0 0
Total smart IO filtered (KB) 0 0
Total cpu passthru output IO size (KB) 0 0
Total passthru output IO size (KB) 0 0
Current smart IO with results in send (KB) 0 0
Total smart IO with results in send (KB) 0 0
Current smart IO filtered in send (KB) 0 0
Total smart IO filtered in send (KB) 0 0
Total smart IO read from flash (KB) 0 0
Total smart IO initiated flash population (KB) 0 0
Total smart IO read from hard disk (KB) 0 0
Total smart IO writes (fcre) to hard disk (KB) 0 0
Number of smart IO requests < 512KB 0 0
Number of smart IO requests >= 512KB and < 1MB 0 0
Number of smart IO requests >= 1MB and < 2MB 0 0
Number of smart IO requests >= 2MB and < 4MB 0 0
Number of smart IO requests >= 4MB and < 8MB 0 0
Number of smart IO requests >= 8MB 0 0
Number of times smart IO buffer reserve failures 0 0
Number of times smart IO request misses 0 0
Number of times IO for smart IO not allowed to be issued 0 0
Number of times smart IO prefetch limit was reached 0 0
Number of times smart scan used unoptimized mode 0 0
Number of times smart fcre used unoptimized mode 0 0
Number of times smart backup used unoptimized mode 0 0

可以看到cellsrvstat收集这么几类信息:

  • I/O相关的统计信息;
  • 内存相关的统计信息;
  • 执行相关的统计信息;
  • 网络相关的统计信息;
  • smart I/O相关的统计信息。

单纯运行cellsrv显示的是当前值。 我们可以通过加上-list参数来查询共有哪些metrics:

[root@dm01cel01 ~]# cellsrvstat -list
Statistic Groups:
io Input/Output related stats
mem Memory related stats
exec Execution related stats
net Network related stats
smartio SmartIO related stats

Statistics:
[ * - Absolute values. Indicates no delta computation in tabular format]

io_nbiorr_hdd Number of hard disk block IO read requests
io_nbiowr_hdd Number of hard disk block IO write requests
io_nbiorb_hdd Hard disk block IO reads (KB)
io_nbiowb_hdd Hard disk block IO writes (KB)
io_nbiorr_flash Number of flash disk block IO read requests
io_nbiowr_flash Number of flash disk block IO write requests
io_nbiorb_flash Flash disk block IO reads (KB)
io_nbiowb_flash Flash disk block IO writes (KB)
io_ndioerr Number of disk IO errors
io_nrfc Number of reads from flash cache
io_nwfc Number of writes to flash cache
io_fcrb Flash cache reads (KB)
io_fcwb Flash cache writes (KB)
io_nfioerr Number of flash cache IO errors
io_nbpfce Size of eviction from flash cache (KB)
io_nolfio Number of outstanding large flash IOs
io_ltow Number of latency threshold warnings during job
io_ltcw Number of latency threshold warnings by checker
io_ltsiow Number of latency threshold warnings for smart IO
io_ltrlw Number of latency threshold warnings for redo log writes
io_bcrti Current read block IO to be issued (KB) *
io_btrti Total read block IO to be issued (KB)
io_bcwti Current write block IO to be issued (KB) *
io_btwti Total write block IO to be issued (KB)
io_bcrii Current read blocks in IO (KB) *
io_btrii Total read block IO issued (KB)
io_bcwii Current write blocks in IO (KB) *
io_btwii Total write block IO issued (KB)
io_bcrsi Current read block IO in network send (KB) *
io_btrsi Total read block IO in network send (KB)
io_bcwsi Current write block IO in network send (KB) *
io_btwsi Total write block IO in network send (KB)
io_bcfp Current block IO being populated in flash (KB) *
io_btfp Total block IO KB populated in flash (KB)
mem_sgahu SGA heap used - kgh statistics (KB)
mem_sgahf SGA heap free - cellsrv statistics (KB)
mem_sgaos OS memory allocated to SGA (KB)
mem_sgahuc SGA heap used - cellsrv statistics - KB
mem_pgaos OS memory allocated to PGA (KB)
mem_pgahuc PGA heap used - cellsrv statistics (KB)
mem_allos OS memory allocated to cellsrv (KB)
mem_sgatop Top 5 SGA consumers (KB) *
mem_sgasubtop Top 5 SGA subheap consumers (KB) *
mem_halfkaf Number of allocation failures in 512 bytes pool
mem_2kaf Number of allocation failures in 2KB pool
mem_4kaf Number of allocation failures in 4KB pool
mem_8kaf Number of allocation failures in 8KB pool
mem_16kaf Number of allocation failures in 16KB pool
mem_32kaf Number of allocation failures in 32KB pool
mem_64kaf Number of allocation failures in 64KB pool
mem_1maf Number of allocation failures in 1MB pool
mem_halfkhwm Allocation hwm in 512 bytes pool
mem_2khwm Allocation hwm in 2KB pool
mem_4khwm Allocation hwm in 4KB pool
mem_8khwm Allocation hwm in 8KB pool
mem_16khwm Allocation hwm in 16KB pool
mem_32khwm Allocation hwm in 32KB pool
mem_64khwm Allocation hwm in 64KB pool
mem_1mhwm Allocation hwm in 1MB pool
mem_lmtf Number of low memory threshold failures
mem_nmtf Number of no memory threshold failures
mem_dynar Dynamic buffer allocation requests
mem_dynaf Dynamic buffer allocation failures
mem_dynafl Dynamic buffer allocation failures due to low mem
mem_dynam Dynamic buffer allocated size (KB)
mem_dynamh Dynamic buffer allocation hwm (KB)
exec_incno Incarnation number *
exec_versf Number of module version failures *
exec_ntwork Number of threads working *
exec_ntnetwait Number of threads waiting for network *
exec_ntreswait Number of threads waiting for resource *
exec_ntmutexwait Number of threads waiting for a mutex *
exec_njx Number of Jobs executed for each job type
exec_topcpusqlid SQL ids consuming the most CPU
net_rxb Total bytes received from the network
net_txb Total bytes transmitted to the network
net_rtxb Total bytes retransmitted to the network
net_sps Number of active sendports
net_sph Hwm of active sendports
net_rois Number of active remote open infos
net_roih HWM of remote open infos
sio_ns Number of active smart IO sessions *
sio_hs High water mark of smart IO sessions *
sio_ncs Number of completed smart IO sessions
sio_oe Smart IO offload efficiency (percentage) *
sio_sis Size of IO avoided due to storage index (KB)
sio_ctb Current smart IO to be issued (KB) *
sio_ttb Total smart IO to be issued (KB)
sio_cii Current smart IO in IO (KB) *
sio_tii Total smart IO in IO (KB)
sio_cfp Current smart IO being cached in flash (KB) *
sio_tfp Total smart IO being cached in flash (KB)
sio_cic Current smart IO with IO completed (KB) *
sio_tic Total smart IO with IO completed (KB)
sio_cif Current smart IO being filtered (KB) *
sio_tif Total smart IO being filtered (KB)
sio_cfc Current smart IO filtering completed (KB) *
sio_tfc Total smart IO filtering completed (KB)
sio_cfo Current smart IO filtered size (KB) *
sio_tfo Total smart IO filtered (KB)
sio_tcpo Total cpu passthru output IO size (KB)
sio_tpo Total passthru output IO size (KB)
sio_cis Current smart IO with results in send (KB) *
sio_tis Total smart IO with results in send (KB)
sio_ciso Current smart IO filtered in send (KB) *
sio_tiso Total smart IO filtered in send (KB)
sio_fcr Total smart IO read from flash (KB)
sio_fcw Total smart IO initiated flash population (KB)
sio_hdr Total smart IO read from hard disk (KB)
sio_hdw Total smart IO writes (fcre) to hard disk (KB)
sio_n512kb Number of smart IO requests < 512KB
sio_n1mb Number of smart IO requests >= 512KB and < 1MB
sio_n2mb Number of smart IO requests >= 1MB and < 2MB
sio_n4mb Number of smart IO requests >= 2MB and < 4MB
sio_n8mb Number of smart IO requests >= 4MB and < 8MB
sio_ngt8mb Number of smart IO requests >= 8MB
sio_nbrf Number of times smart IO buffer reserve failures
sio_nrm Number of times smart IO request misses
sio_ncio Number of times IO for smart IO not allowed to be issued
sio_nplr Number of times smart IO prefetch limit was reached
sio_nssuo Number of times smart scan used unoptimized mode
sio_nfcuo Number of times smart fcre used unoptimized mode
sio_nsbuo Number of times smart backup used unoptimized mode

我们可以通过加上-h来查看其帮助选项:

[root@dm01cel01 ~]# cellsrvstat -h
Usage:
cellsrvstat [-stat_group=<group name>,<group name>,]
[-stat=<stat name>,<stat name>,] [-interval=<interval>]
[-count=<count>] [-table] [-short] [-list]

stat A comma separated list of short strings representing
the stats.
Default is all. (unless - stat_group is specified.
The -list option displays all stats.
Example: -stat=io_nbiorr_hdd,io_nbiowr_hdd
stat_group A comma separated list of short strings representing
groups of stats.
Default: all (unless -stat is specified).
Currently valid options are: io, mem, exec, net.
Example: -stat_group=io,mem
interval At what interval the stats should be obtained and
printed (in seconds). Default is 1 second.
count How many times the stats should be printed.
Default is once.
list List all metric abbreviations and their descriptions.
All other options are ignored.
table Use a tabular format for output. This option will be
ignored if all metrics specified are not integer
based metrics.
short Use abbreviated metric name instead of
descriptive ones.
error_out An output file to print error messages to, mostly for
debugging.

In non-tabular mode, The output has three columns. The first column
is the name of the metric, the second one is the difference between the
last and the current value(delta), and the third column is the absolute value.
In Tabular mode absolute values are printed as is without delta.
cellsrvstat -list command points out the statistics that are absolute values

-stat_group=后面接统计信息的组名,例如上面提到的io, mem, exec, net。

-stat=后面接根据-list参数查找出来的统计信息的名称,例如io_nbiorr_hdd,io_nbiowr_hdd。

-interval=后面接统计信息采样的间隔

-count=后面接统计信息采样的次数

-table 表示使用统计信息简写的方式代替真实的名称 。

举一个例子:例如我们需要收集sio_ttb ­和 sio_tii ­两项信息,采样的频率为一秒一次,一共采样十次:

[root@dm01cel01 ~]# cellsrvsta -table -interval=1 -count=10 -stat=sio_ttb,sio_tii
===Current Time=== sio_ttb sio_tii
Fri Aug 23 08:29:46 2013 0 0
Fri Aug 23 08:29:47 2013 0 0
Fri Aug 23 08:29:48 2013 0 0
Fri Aug 23 08:29:49 2013 0 0
Fri Aug 23 08:29:50 2013 0 0
Fri Aug 23 08:29:51 2013 0 0
Fri Aug 23 08:29:52 2013 0 0
Fri Aug 23 08:29:53 2013 0 0
Fri Aug 23 08:29:54 2013 0 0
Fri Aug 23 08:29:55 2013 0 0

去掉-table选项则输出完整的信息:

[root@dm01cel01 ~]# cellsrvstat -interval=1 -count=10 -stat=sio_ttb,sio_tii
===Current Time=== Fri Aug 23 08:30:25 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:26 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:27 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:28 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:29 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:30 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:31 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:32 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:33 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

===Current Time=== Fri Aug 23 08:30:34 2013

== SmartIO related stats ==
Total smart IO to be issued (KB) 0 0
Total smart IO in IO (KB) 0 0

实际在oswatcher中会默认调用这个脚本:

[root@dm01cel01 ~]# ps -ef | grep osw
root 5219 17360 0 08:38 pts/0 00:00:00 grep osw
root 12914 23131 0 08:00 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_cellsrvstat.sh
root 31625 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_vmstat.sh
root 31626 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_mpstat.sh
root 31627 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_netstat.sh
root 31628 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_iostat.sh
root 31629 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_diskstats.sh
root 31633 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_top.sh
root 31643 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh
root 31656 31643 0 04:02 ? 00:00:03 /bin/bash /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh HighFreq

 

 

[root@slca04cel01 osw]# cat /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.sh
#!/bin/bash
# Copyright (c) 2009, 2011, Oracle and/or its affiliates. All rights reserved.

out_file=
zip_prog=
declare -i self_count=1
declare -i sample_interval=1
declare -i sample_duration=3
declare -i sample_count=1

/bin/touch /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock
echo $$ > /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock
while [ -e /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock ];
do
if [ -f "archive/oswcellsrvstat/$1" ]; then
if [ ! -z "$out_file" ] && [ ! -z "$zip_prog" ]; then
$zip_prog $out_file &
fi
out_file=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 1`
if [ $? -ne 0 ]; then
/bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty"
exit 1
fi
zip_prog=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 2`
if [ $? -ne 0 ]; then
/bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty"
exit 1
fi
sample_interval=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 3`
if [ $? -ne 0 ]; then
/bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty"
exit 1
fi
sample_duration=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 4`
if [ $? -ne 0 ]; then
/bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty"
exit 1
fi
/bin/rm -f "archive/oswcellsrvstat/$1"
else
break
fi
if [ ! -z "$out_file" ]; then
if [ $sample_interval -gt 0 ] && [ $sample_duration -gt 0 ] && [ $sample_duration -gt $sample_interval ]; then
sample_count=$((sample_duration / sample_interval))
/bin/echo "zzz ***"`date`" Sample interval: $sample_interval secconds" >> ${out_file}

$OSS_BIN/cellsrvstat -interval=$sample_interval -count=$sample_count >> ${out_file}

bzip2 ${out_file}
/bin/rm -f ${out_file}
else
/bin/echo "[ERROR] Invalid arguments for sample_duration and sample_interval"
break
fi
fi
done

/bin/rm -f /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock
exit 0

 

Exadata使用的默认端口

 

http://www.dbaleet.org/default_ports_which_exadata_used/

 

Exadata使用的默认端口,方便设置防火墙的规则,出自官方手册,简单记录下:

Source Target Protocol Port Application
NA Database management SSH over TCP 22 SSH
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs SSH over TCP 22 SSH
NA KVM SSH over TCP 22 SSH for serial sessions to MPUIQ-SRL module
NA Storage management SSH over TCP 22 SSH
NA KVM Telnet over TCP 23 Telnet, when enabled
Exadata Storage Servers E-mail server SMTP 25

465 if using SSL

SMTP (Simple Mail Transfer Protocol)
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA TFTP over UDP 69 Outgoing TFTP (Trivial File Transfer Protocol)
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs HTTP over TCP 80 Web (user configurable)
NA KVM HTTP over TCP 80 Avocent video viewer download for Java applet
NA PDU HTTP over TCP 80 Browser interface
Database management NA NTP over UDP 123 Outgoing Network Time Protocol (NTP)
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA NTP over UDP 123 Outgoing NTP
Storage management NA NTP over UDP 123 Outgoing NTP
ASR Manager ASR asset SNMP (get) 161 FMA enrichment for additional diagnostic information
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs SNMP over UDP 161 SNMP (Simple Network Management Protocol) (user configurable)
NA KVM SNMP over UDP 161 SNMP (user configurable)
NA PDU SNMP over UDP 161 SNMP (user configurable)
Exadata Storage Servers SNMP subscriber such as Oracle Enterprise Manager Grid Control or an SNMP manager SNMP 162 SNMP version 1 (SNMPv1) outgoing traps (user-configurable)
Database servers, and Exadata Storage Servers ILOMs ASR Manager SNMP 162 Telemetry messages sent to ASR Manager
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA IPMI over UDP 162 Outgoing IPMI (Intelligent Platform Management Interface) Platform Event Trap (PET)
KVM NA SNMP over UDP 162 Outgoing SNMPv2 traps
PDU NA SNMP over UDP 162 Outgoing SNMPv2 traps
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs LDAP over UDP/TCP 389 Outgoing LDAP (Lightweight Directory Access Protocol) (user configurable)
ASR Manager ASR backend HTTPS 443 Telemetry messages sent to ASR backend
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs HTTPS over TCP 443 Web (user configurable)
NA KVM HTTPS over TCP 443 Browser interface for MergePoint Utility switch and KVM sessions
NA PDU HTTPS over TCP 443 Browser interface
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA Syslog over UDP 514 Outgoing Syslog
KVM NA Syslog over UDP 514 Outgoing Syslog
PDU NA Syslog over UDP 514 Outgoing Syslog
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA DHCP over UDP 546 client DHCP (Dynamic Host Configuration Protocol)
KVM NA DHCP over UDP 546 DHCP client
PDU NA DHCP over UDP 546 DHCP (Dynamic Host Configuration Protocol) client
NA Database servers, Exadata Storage Servers, and InfiniBand ILOMs IPMI over UDP 623 IPMI (Intelligent Platform Management Interface)
Oracle Enterprise Manager Grid Control NA TCP 1159 Oracle Enterprise Manager Grid Control HTTPS upload port
Oracle Enterprise Manager Grid Control NA TCP 1159 Oracle Enterprise Manager Grid Control HTTPS upload port
NA Database data TCP 1521 Database listener
Database servers, Exadata Storage Servers, and InfiniBand ILOMs NA RADIUS over UDP 1812 Outgoing RADIUS (Remote Authentication Dial In User Service) (user configurable)
NA KVM TCP 2068 KVM session data for keyboard and mouse transmission, or for video transmission on for MergePoint Unity switch
Oracle Enterprise Manager Grid Control NA TCP 4889 Oracle Enterprise Manager Grid Control HTTP upload port
Oracle Enterprise Manager Grid Control NA TCP 4889 Oracle Enterprise Manager Grid Control HTTP upload port
NA Database servers, and Exadata Storage Servers ILOMs TCP 5120 ILOM remote console: CD
NA Database servers, and Exadata Storage Servers ILOMs TCP 5121 ILOM remote console: keyboard and mouse
NA Database servers, and Exadata Storage Servers ILOMs TCP 5123 ILOM remote console: diskette
NA Database servers, and Exadata Storage Servers ILOMs TCP 5555 ILOM remote console: encryption
NA Database servers, and Exadata Storage Servers ILOMs TCP 5556 ILOM remote console: authentication
ASR Manager Database servers, and Exadata Storage Servers ILOMs HTTP 6481 Service tags listener for asset activation
NA Database servers, and Exadata Storage Servers ILOMs TCP 6481 ILOM remote console: Servicetag daemon
NA Database servers, and Exadata Storage Servers ILOMs TCP 7578 ILOM remote console: video
NA Database servers, and Exadata Storage Servers ILOMs TCP 7579 ILOM remote console: serial
NA Database servers TCP 7777 Oracle Enterprise Manager Grid Control HTTP console port
NA Exadata Storage Servers TCP 7777 Oracle Enterprise Manager Grid Control HTTP console port
NA Database servers TCP 7799 Oracle Enterprise Manager Grid Control HTTPS console port
NA Exadata Storage Servers TCP 7799 Oracle Enterprise Manager Grid Control HTTPS console port

 

如何升级Exadata 存储节点cell image

原文链接: http://www.dbaleet.org/how_to_upgrade_cell_image_of_exadata/

Exadata存储节点,即我们常说的cell节点,在Exadata中承担着双重作用:

一是提供存储的介质,所有的非二进制文件都存在在此;
二是提供大量的offloading的任务,计算节点(db 节点)通过smart scan等,把一部分任务“下沉”分布到cell节点。

而升级cell的image中主要是升级以下内容:

操作系统的信息:包括一些基本的rpm包,操作系统内核,
固件类信息:例如磁盘控制器的固件,ILOM的固件等;
驱动类信息:依赖于内核版本的infiniband驱动ofa;

升级Exadata的cell的image可以使用在线的方式进行也可以使用离线的方式进行,在线升级的好处是无需停止数据库服务,但是通常单个cell节点image升级的时间接近三个小时,如果一台满配的Exadata,升级完所有cell的image所需要花费的时间为14×3=52个小时,这还不包括检查,如果出现意外情况的troubleshooting的时间。实际上在线升级完一台满配的Exadata的cell image一般需要花费60个小时。另外就是在线升级的过程中,其它节点如果发生坏盘,那么就有可能会造成数据的丢失。为什么呢?因为在升级某一台cell的image的时候,并不做rebalance的动作,升级这个过程中,这台cell的所有盘都相当于是offline状态的,这台cell中所有盘中保存的信息,在其它所有cell节点有且仅有一份镜像。(这里说的是正常冗余的情况,如果是高冗余,则为两份)。如果这个时候其它cell中有一块盘发生了不测,则就有可能丢失数据,因为等这个cell的image升级完成以后,会自动同步Exadata的元数据和其它对应镜像的修改后的信息,如果坏的盘恰好是“某一块”,则悲剧就诞生了。当然,你也可以使用离线的方式进行升级。离线需要停止db节点上的集群和cell节点上所有节点的celld服务。但是它的好处在于,可以进行并行地进行cell image的升级,例如可以一次性的升级完所有的cell节点的image,时间也是接近三个小时。不管是四分之一配,半配,还是满配,通通只要三个小时。但是同样也存在风险,例如如果多台cell被刷坏了,操作系统起不来,这样也是比较危险的,但是这种情况相比坏盘概率小很多,可以说几乎和中彩票头奖的概率差不多,如果你不幸遇到这样的情况,请记得下次帮我去买张彩票。

在线升级cell的image往往需要较长的时间进行详细的规划,防止各种突发故障,这个并非三五百字可以讲完,所以我这里只写出离线升级cell image的方法:以下是为某客户Exadata cell image从11.2.2.4.2升级到11.2.3.2.1的全部过程:

升级前的准备工作

1.准备cell image的patch:

下载cell image的patch,patch号为14522699。使用root用户上传到eccel01节点的/opt/oracle.SupportTools/目录下。如果是使用ftp上传,需要注意使用二进制bin模式。

使用以下命令进行解压:

#unzip p14522699_112321_Linux-x86-64.zip

使用md5sum对解压后的文件进行md5码校验,以下五个文件的md5码应该为:

3a8f090e9410c80b0b3a27026472cd0 patch_11.2.3.2.1.130109/11.2.3.2.1.130109.iso
69d3bf2dfc6f650bd9f4f2413b084ae2 patch_11.2.3.2.1.130109/11.2.3.2.1.130109.patch.tar
f2d7a739d9b813f3ed1c38f25678b603 patch_11.2.3.2.1.130109/dcli
0a327e437d81be782e4765263cb61b22 patch_11.2.3.2.1.130109/dostep.sh
8ea5f9270dbaa1f6c8a94630ad150a58 patch_11.2.3.2.1.130109/patchmgr

如果不正确,则需要重新上传解压。

2.准备cell_group文件

检查/opt/oracle.SupportTools/onecomman/cell_group文件中的内容是否为:

dm01cel01
dm01cel02
dm01cel03

以上以实际的cell主机名代替。

3. 检查所有节点的cell.conf文件是否一致:

#/opt/oracle.cellos/ipconf -verify

4. 检查ssh是否支持patchmgr:

打开ssh的debug模式

#ssh -v -v ecdb02>ssh_client_debuglog.txt

按照提示输入密码

5. 配置SSH加密算法:

运行以下命令列举出当前SSH加密的算法

#ssh -v -v ecdb02>ssh_client_debuglog.txt
#sed -e '/SSH2_MSG_KEXINIT received/,/first_kex_follows/!d' \
ssh_client_debuglog.txt | grep \
'aes128-ctr\|aes192-ctr\|aes256-ctr\|arcfour'

返回结果不能为空,如果为空,表示当前ssh不支持必需的加密算法。那么在/etc/ssh/ssh_config加入这么一行

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour

6. 建立SSH连通性:

使用如下命令验证,节点之间root的ssh连通性已经建立:

#dcli -g cell_group -l root 'hostname -i'

如果提示需要输入密码,则可以使用如下方式建立ssh的等效性:

先生成本机的密钥:

#ssh-keygen -t rsa

输入回车保持默认,这样会创建root用户的rsa密钥

使用如下命令将这个密钥推送到cell节点:

#dcli -g cell_group -l root -k

这个过程需要输入其它cell节点的密码。

7. 修改disk_repair_time:

修改 disk_repair_time到一个更长的时间,防止在升级的期间离线的节点的griddisk被强制drop。

SQL> select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a \
where dg.group_number=a.group_number and a.name='disk_repair_time';

将其修改到一个较大的时间:

SQL> alter diskgroup diskgroup_name set attribute 'disk_repair_time'='36h';

这里diskgroup_name用实际的磁盘组的名称代替,同时需要对所有的磁盘组的disk_repair_time的属性进行修改

 

8. 检查所有griddisk的状态

确认所有的griddisk的状态为online。

#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root cellcli -e 'list griddisk attributes name,asmmodestatus'

升级过程

1. 停止所有DB节点的crs:

dcli -g dbs_group -l root "/u01/app/11.2.0/grid/bin/crsctl stop crs -f"

完成以后使用如下方式进行验证:

dcli -g dbs_group -l root "ps -ef | grep grid"

 

2. 关闭所有的cell服务器上的cellsrv:

dcli -g cell_group -l root "cellcli -e alter cell shutdown services all"

3. 进入目录patch目录:

cd /opt/oracle.SupportTools/ patch_11.2.3.2.1.130109

4. 对之前使用patchmgr升级的残留信息进行清理:

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group –cleanup

执行下面的检查命令,检查存储节点是否满足升级需求:

# ./patchmgr -cells . /opt/oracle.SupportTools/onecommand/cell_group -patch_check_prereq

5. 检查没有问题,运行下面的命令升级存储节点的image版本:

# ./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -patch

6. 在db01节点使用ilom对升级的进度进行监控整个过程:

使用cell的ilom地址登录,然后启动串口:

start /SP/console

如果需要停止,则先按住esc,然后输入:

stop /SP/console

注意升级的过程中会有多次ilom的中断,属于正常的情况。

 

升级后验证工作

1. 确认所有的cell都已经升级到11.2.3.1:

#dcli -g cell_group -l root 'imagehistory'

2. 确认kernel已经升级:

# dcli -g cell_group -l root “rpm -qa | grep kernel”

3. 确认ofa的版本已经升级:

#dcli -g cell_group -l root “rpm -qa | grep ofa”

4. 升级完成以后再一次进行清理:

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group –cleanup

5. 取消ssh的信任关系(可选):

# dcli -g cell_group -l root --unkey

6. 启动CRS和数据库服务器上的其它所有agent:

# crsctl start cluster -all

6. 修改disk_repaire_time:

SQL> alter diskgroup diskgroup_name set attribute 'disk_repair_time'='3.6h';

以上

沪ICP备14014813号-2

沪公网安备 31010802001379号