如何确认11.2 RAC Grid Infrastructure的时区

本文档描述如何在11.2 Grid Infrastructure 安装完成后修改其GI的时区time zone 设置。

 

一旦OS默认时区被修改,注意确保以下2点:

1. 对于11.2.0.1 ,确保root、grid、oracle用户的shell环境变量TZ设置正确!
2. 对于11.2.0.2及以上版本,确认 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 中的参数设置为正确的时区

 

例如:

 

ech $TZ
TZ=US/Pacific

grep TZ s_crsconfig__env.txt 
TZ=US/Pacific

 

 

若timezone设置时区不正确或存在无关字符可能导致RAC Grid Infrastructure无法正常启动。

确保以上2点保证GI能正常启动,这种因时区不正确导致的启动异常问题常发生在OS、GI已经安装完毕后而时区最后被修改的情景中,若发现OS时间与ohasd.log、ocssd.log等日志中的最新记录的时间不匹配,则往往是由该时区问题引起的。

 

在11.2 CRS之前可以通过init.cssd diag来确认时区设置。

 

 

以下为各OS上Timezone默认配置信息

 

 

Linux

To change: /usr/sbin/timeconfig

To display current setting:

cat /etc/sysconfig/clock
ZONE="America/Los_Angeles"
UTC=true
ARC=false
To find out all valid setting: ls -l /usr/share/zoneinfo

Anything that appears in this directory is valid to use, for example, CST6CDT and America/Chicago.

Note: the "Zone" field in /etc/sysconfig/clock could be different than what's in /usr/share/zoneinfo in OL6.3/RHEL6.3, the one from /usr/share/zoneinfo should be used in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt

hp-ux

To display current setting:

cat /etc/default/tz
PST8PDT
To change: set_parms timezone

To find out all valid setting: ls -l /usr/lib/tztab

Solaris

To display current setting:

grep TZ /etc/TIMEZONE
TZ=US/Pacific
To change, modify TIMEZONE, also run "rtc -z US/pacific; rtc -c"

To find out all valid settings: ls -l /usr/share/lib/zoneinfo

AIX

To display current setting:

grep TZ /etc/environment
TZ=GMT

11g OCM Upgrade Exam Tips

之前有网友向我咨询一些11g  OCM one day Upgrade Exam升级考试的应试技巧, 我在不违反保密协议的前提下给出了一些建议。

 

总体来说11g OCM的考试要比10g在难度上高出不少,且因为涉及到大量11g的新特性,在国内11g中文资料较为匮乏的大环境下,为考试所要做的准备工作量还是不小的。
我是从2011年的1月开始准备考试,2个月里整理测试了不少11g新特性, 也写了一些11g新特性的相关专题文章(你可以在askmac.cn 我的个人博客上搜索到),2011年3月11日参加考试并成功通过的, 考试前一天都还没睡好,结果当天日本就发生了地震,导致我对这个日期记得特别牢!

 

以下是网友的一些问题及我的建议:
1)请简要介绍下考试环境。比如也是两台机器吗?功能分配是否和10g一样?共4个section,每个分别占用多长时间?

Answer: 是2台PC机,这个在考纲中就有明确说明;其中一台作为OMS。各环节的时间在105分钟-120分钟不等

 

2)在考试中grid control 10.2.0.5版本是已经安装好了,还是需要自己安装?还是已经安装了10.2.0.1版本然后要求考生升级到10.2.0.5?然后在另外一台机器上安装agent?10g中装完agent直接就可以使用了,配置agent不知道考试什么?

Answer:不需要自己安装;agent要自己安装,所以去详细了解下agent的安装吧,这个环节很重要!

 

3)下午的考试是不是使用上午的冷备份?我一直比较奇怪,为什么要考试冷备份?

Answer:和10g一样会出现需要恢复的场景

 

4)dataguard的新特性基本都考到了,observer在哪台机器上启动?使用grid control做这一部分好,还是使用sql*plus命令行手动做这个比较好?还是使用dgmgrl命令行?这一部分我在10g的考试中,因为不相信grid control的稳定性,选择了使用sql*plus做,结果时间非常紧张,在最后一分钟才做完。考试结束以后,我想这部分就是设计让考生使用grid control做的,不知道理解的对不对。11g的考试中需要注意哪些?

Answer: 我建议用SQLPLUS+DGMGRL,你可以用11g的duplicate standby from active database特性,很方便。可以参考我的文章《Duplicate standby database from active database》https://www.askmac.cn/archives/duplicate-standby-database-from-active-database.html

 

5)这次考试有没有考到streams?10g考纲里是有的,但是没有考到。

Answer:有STREAMS,而且并不简单;重点复习或者放弃。

 

6)要在两台oel之间配置NFS,然后考试Create and Manage a tablespace that uses NFS mounted file system file?还是NFS已经配置好了?

Answer:NFS帮你都配好了,你只需要关心让Oracle用就可以了。

 

7)有没有超出考纲的试题?

Answer: 没有超纲,但有很多Trap

 

8)物化视图有没有考到查询重写?需要troubleshoot的问题是否比较棘手?

Answer:考了

 

9)Create partitioned tables (includes reference and interval partitioning),这个考试创建什么样的分区表?
Answer:这环节并不困难,少有的送分题

 

10)SQL Tuning Advisor和Access Advisor考试形式是什么?使用grid control做好还是用dbms包来做这部分题,哪一个比较好。
Answer:显然是用OMS

 

11)你建议的考试中需要注意的其它问题是什么?
Answer:把知识点弄透彻,因为存在很多陷阱;考试时一定把题目看清楚看透彻,不会做的题目千万不要浪费太久时间。

 

Question:

按照你上次邮件里的指点的方法做了实验,感觉Duplicate standby database from active database的确是最优的可控的搭建dataguard的方法。配合dgmgrl,让一切变得如此简单!没想到能看到你关于这个话题的文档,倍感亲切。
关于dataguard考试,我还做了假想,dataguard应该像10g那样在一台机器上,observer应该部署在与dataguard不同的机器上,就是oms机上。配置observer就是配置tnsnames.ora
snapshot-sb就是一条命令,没有什么说的,前提是主备库配置好闪回数据库。
另外有一个问题请教:dataguard考试中,Configure the data guard environment to reduce overheads of fast incremental backups on the primary database这个问题,我不是非常确定,查了oracle的dataguard概念和管理文档、备份和恢复用户手册以及metalink,还有网上一些文档,没有发现最优方案。可能是11g的active dg还没有大量应用吧。
我想到的方法是在实时应用的备库打开块修改跟踪,然后0级备份,增量备份,但是由于控制文件和spfile文件不能用于主库失败的恢复,在备库做完备份的同时,在主库做控制文件和spfile的备份。但是不知道这样在恢复到主库的时候会不会有问题。对于这个问题,你有好的最佳实践吗?
alter database enable block change tracking using file ‘/u01/app/oracle/oradata/orcl/bct.log’;
configure db_unique_name sbdb connect identifier ‘sbdb’;
每周日
connect target sys/oracle@sbdb catalog rman/rman@rman
BACKUP AS BACKUPSET INCREMENTAL LEVEL 0 DATABASE PLUS ARCHIVELOG;
connect target sys/oracle@primary catalog rman/rman@rman
backup current controlfile;
backup spfile;
每天
connect target sys/oracle@sbdb catalog rman/rman@rman
BACKUP AS BACKUPSET INCREMENTAL LEVEL 1 DATABASE PLUS ARCHIVELOG;
connect target sys/oracle@primary catalog rman/rman@rman
backup current controlfile
backup spfile;
每周三
connect target sys/oracle@sbdb catalog rman/rman@rman
BACKUP AS BACKUPSET INCREMENTAL LEVEL 1 CUMULATIVE DATABASE PLUS ARCHIVELOG;
connect target sys/oracle@primary catalog rman/rman@rman
backup current controlfile;
backup spfile;

还有:Configure archivelog deletion policy for the dataguard configuration这个问题,我想就用rman的configure分别
在主库上设置
configure archivelog deletion policy to shipped to all standby;

在备库上设置
configure archivelog deletion policy to applied on all standby;
或者
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 2 TIMES TO SBT;(or DISK)  (oracle dataguard文档推荐这样设置P170)

对于archivelog删除策略这个问题你怎么看?

最后一个问题:由于streams在工作中用的比较少,我比较生疏。我按照附件中的文档,做了前11个练习。文档中实验的难度是逐渐加深的,我也想都做一遍,但是streams的问题太多了,实验中还不包括流的split和合并。下一步我会花时间把这个问题认真的学习下,眼前要紧的是先把考试过了。你不要告诉我考题,只需要帮我确认考试的难度在哪个区间?比如1-3,7-9还是16-17等。谢谢了。

 

Answer:

关于reduce overhead的问题,我认为你所写的已经足够充分了,但是不知道你是否注意过11.1.0.7上fast incremental backup存在Bug,具体可以参见我的一篇文章:https://www.askmac.cn/archives/fast-incremental-backup-failed-on-standby-database.html
如果你把附件中的Practice都练习过的话,那么可以说Streams这块准备的比较充分了

我想11g ocm对有经验的DBA而言并非难事,就你目前的复习进度和能力,通过该认证应当是很轻松的事情了。

祝!考试顺利!

 

此外Gavin Soorma是世界上较早通过11g OCM Upgrade考试获得11g大师认证的同仁之一, 他对于这门考试也给出一些应试建议, 引用如下:

 

  • The 11g OCM upgrade exam is a one day exam while the 11g OCM is a two day exam like the 10g exam. It can be a long day between 8-10 hours, so be prepared for it.
  • I have listed the exam topics here which can be also obtained from the OTN site related to Oracle Certification. While preparing, try and cover each and every topic as the exam will basically cover pretty much every exam objective listed.
  • It’s not about what you know – it’s about how fast you can do stuff as time is the essence in this exam. Working under pressure of time is the biggest obstacle I found that I had to cross.
  • Don’t be overwhelmed by the volume of what is asked in the exam. Every skill set will have an “End State” where your database is expected to be in a particular state. Ensure you keep this in mind and while it may not be possible to complete every task, whatever you do, do it well rather than start each task and leave it incomplete because of time constraints.
  • Practice, practice and more practice. Get as much hands-on experience with all the exam topics. Unlike the 11g OCM exam, the 11g OCM Upgrade exam does not cover topics like RAC and ASM. So it should be quite easy to create a database on even a personal laptop and get as much hands on time as possible as your work environment may not offer opportunities to be faced with all the exam objectives in your practical day to day work as a DBA.
  • While doing the practice, time yourself and see how long it takes to perform any of the exam objectives. Try and figure ways to reduce time through your own shortcuts.
  • Also be familiar with the command line option as well using the different API’s and do not just work on performing everything via GUI Enterprise Manager.
  • The documentation is your best friend in the exam. Be fully familiar with it and know very clearly which manual in the documentation set covers which exam topic. Documentation on both Enterprise Manager as well as the Database Server will be provided, but do remember that it is not searchable. So you do not have the luxury of time in the exam to browse through the entire documentation set trying to find what you are looking for.
  • Very Important – Be very familiar with different backup and recovery scenarios. You will be expected to perform recovery and at the end of every skill set your database needs to be in a required End State – so no marks for crashed databases!
  • Remember the Enterprise Manager version is not 11g even though the exam is an 11g exam. You will be provided a 10.2.0.5 Grid Control environment.
  • While the official OCM certification site does state that the database version will be 11g Release 2 on Red Hat Linux 5, the database version that I got at the exam was 11.1.0.7. So not too sure if that was a one-off case or the norm.

 

OTN 的Certification Path认证通道是最权威的OCM考试信息来源,更多信息可以参考:

 Oracle Database 11g Certified Master Upgrade Exam

 

Exam Topics 考试内容大纲如下:

 

 

 

顺便也贴下我的OCM Profile:

11g Oracle Certified Master profiles
10g Oracle Certified Master profiles

Real Application Testing Database Replay、SPA的价格和介绍

Real Application Testing = SPA + Database Replay

SPA和 Database Replay的区别:

Database Replay好比是给鸭梨山大的锤子:

 

SPA: SQL focused and unit testing SQL component
Database Replay: Workload focused, comprehensive testing of database stack

Database Replay
Load testing solution for performance and scalability testing
“Capture” entire workload (queries, DML, DDL, PL/SQL, etc.) in production, including concurrency
“Replay” entire workload in test with exact production characteristics including concurrency
Ideal for system upgrades, configuration changes (SI to RAC), storage changes, etc.

 

SPA还比是好用的起钉锤:

SQL Performance Analyzer
Solution for identifying performance regressions/improvements resulting from execution plan changes
“Capture” SQL queries from production, including executions plans & stats
“Test-Execute” SQL queries serially “before” and “after” changes
Compare “before” and “after” SQL execution plans and performance stats
Ideal for changes impacting query plans such as db upgrades, optimizer statistics refresh, new index creation, etc.

 

Real Application Testing RAT作为DBEE(企业版)数据库的选项来销售, 单独使用SPA或Database Replay均要求购买完整的RAT使用许可证。

RAT的具体价格为$11,500/cpu ($230/NUP Named User Plus),无论是产品还是测试数据库。

注意当从9iR2或10gR1升级到10gR2时,SPA要求使用在9iR2或10gR1的产品数据库上使用SQL TRACE功能以便捕获SQL语句。仅在此场景中,RAT license仅对测试系统有效而非产品系统。

 

SPA Applicable for Pre-11g Database Releases

 

SQL Performance Analyzer
Capture on 9i, 10.1, 10.2 database releases
Test changes in 10.2 & above
Database Replay
Capture on 9i, 10.2 database releases
Test changes in 11.1 & above

 

11gR2 RAC vip和network资源的依赖性与频繁failover

在11gR2中出现了Oracle Network resource类型的CRS resource用以监控Public Network:

 

[root@vrh1 ~]# crsctl status resource ora.net1.network -f
NAME=ora.net1.network
TYPE=ora.network.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=1
CREATION_SEED=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=Oracle Network resource
ENABLED=1
ID=ora.net1.network
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=60
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=2.2
UPTIME_THRESHOLD=1d
USR_ORA_AUTO=static
USR_ORA_ENV=
USR_ORA_IF=eth0
USR_ORA_NETMASK=255.255.255.0
USR_ORA_SUBNET=192.168.1.0
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status resource ora.vrh1.vip -f        
NAME=ora.vrh1.vip
TYPE=ora.cluster_vip_net1.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=1
CHECK_TIMEOUT=30
CREATION_SEED=34700
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip)
DEGREE=1
DESCRIPTION=Oracle VIP resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
GEN_USR_ORA_STATIC_VIP=
GEN_USR_ORA_VIP=
HOSTING_MEMBERS=vrh1
ID=ora.vrh1.vip
LOAD=1
LOGGING_LEVEL=2
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=favored
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
TYPE_VERSION=2.1
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
USR_ORA_VIP=vrh1-vip
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status res ora.LISTENER.lsnr -f
NAME=ora.LISTENER.lsnr
TYPE=ora.listener.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=%CRS_HOME%/bin/racgwrap%CRS_SCRIPT_SUFFIX%
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME_LOWER_CASE%.LISTENER_%CRS_CSS_NODENAME_UPPER_CASE%.lsnr
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
CREATION_SEED=35089
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=Oracle Listener resource
ENABLED=1
ENDPOINTS=TCP:1521
ID=ora.LISTENER.lsnr
LOAD=1
LOGGING_LEVEL=5
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1521
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type)
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:type:ora.cluster_vip_net1.type)
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=ORACLE_BASE=/g01/orabase
USR_ORA_OPI=false
VERSION=11.2.0.3.0

 

 

可以看到在11.2 RAC中Listener资源依赖于VIP, 而VIP资源依赖于ora.net1.network;这就造成了当public network短时不可用(或曰network hiccup)时会造成ora.net1.network资源OFFLINE,这就将造成该节点上VIP资源的FAILOVER和LISTENER的OFFLINE。

且由于在11.2上ora.net1.network资源的默认CHECK_INTERVAL=1,即每秒都会对该NETWORK资源进行监控,所以NETWORK资源变得十分敏感,不管是由于硬件网络亦或者较高的主机负载造成短时的Public Network不可用,都可能导致VIP和LISTENER由于NETWORK依赖资源OFFLINE而受到影响。

 

我们来实际体会一下:

 

 

 

[root@vrh1 orarootagent_root]# pwd
/g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root

2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.597: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.602: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:17.594: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

[root@vrh1 orarootagent_root]# lsof |grep orarootagent_root.log
orarootag  3835      root    4w      REG              253,0   9109896              4717997 /g01/11.2.0/grid/log/vrh1/agent/ohasd/orarootagent_root/orarootagent_root.log
orarootag 23559      root    4w      REG              253,0   3166410              4723518 /g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root/orarootagent_root.log

[root@vrh1 orarootagent_root]# ps -ef|egrep "3835|23559"|grep -v grep
root      3835     1  0 Oct23 ?        06:57:18 /g01/11.2.0/grid/bin/orarootagent.bin
root     23559     1  0 Nov20 ?        04:06:24 /g01/11.2.0/grid/bin/orarootagent.bin

 

 

 

 

可以看到如上orarootagent.bin进程每秒都会对VIP的依赖资源ora.net1.network做心跳检测,我们手动在3s内关闭和重启NETWORK资源,并观测结果:

 

 

 

 

2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} ora.net1.network vrh1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Switching online monitor to offline one
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047944
2012-12-21 02:09:05.781: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776] {1:62611:2} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.790: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:05.791: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.vrh1.vip 1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Preparing START command for: ora.net1.network vrh1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.scan1.vip 1 1
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init enter {
2012-12-21 02:09:05.878: [ora.scan1.vip][2281658688] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} ora.scan1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047956
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending last reply for: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:06.056: [ora.net1.network][1266022720] {0:1:53734} [start] Checking if eth0 Interface is fine
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic {
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic }
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic {
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic }
2012-12-21 02:09:06.347: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start {
2012-12-21 02:09:06.348: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start }
2012-12-21 02:09:06.348: [    AGFW][1400256832] {1:62611:2} ora.vrh1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:06.348: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047960
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.350: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:06.395: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.445: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv {
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv }
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} Thread:[SendFail2SrvThread:]isRunning is reset to false here
2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init exit }
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop {
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop }
2012-12-21 02:09:06.725: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.726: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:09:06.836: [    AGFW][1266022720] {0:1:53734} Command: start for resource: ora.net1.network vrh1 1 completed with status: SUCCESS
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:09:06.836: [    AGFW][1400256832] {0:1:53734} Agent sending reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:09:06.837: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetInterface::scheckNetInterface returned 0 
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::checkInterface returned false
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: STARTING to: OFFLINE
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Switching online monitor to offline one
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237

 

 

可以看到如上在Public Network DOWN之后1s内orarootagent进程即检测到并试图重启ora.net1.network,重启失败后将VIP置为OFFLINE状态,随后LISTENER也被OFFLINE,之后当Public Network恢复后,network、vip、listener相继自动恢复:

 

 

2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] ifname=eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetmask=255.255.255.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetnumber=192.168.1.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Interface Name = eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Primary IP = 192.168.1.161
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start {
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start }
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: ONLINE
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Switching offline monitor to online one
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=1000 delay=1000
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047975
2012-12-21 02:10:07.244: [ USRTHRD][1266022720] {0:1:53734} Relocating Node Vip Resource
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic {
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic }
2012-12-21 02:10:07.270: [    GIPC][1266022720] gipcCheckInitialization: possible incompatible non-threaded init from [clsss.c : 5326], original from [clsnOraRootAgent.
cpp : 208]
2012-12-21 02:10:07.278: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic {
2012-12-21 02:10:07.279: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic }
2012-12-21 02:10:07.310: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]isRunning is reset to false here
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Agent received the message: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Preparing START command for: ora.vrh1.vip 1 1
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop }
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop }
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkMask = 255.255.255.0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkAuto = static
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Checking if eth0 Interface is fine
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] ifname=eth0

2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetmask=255.255.255.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetnumber=192.168.1.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] HostName vrh1-vip translated to 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Interface Name = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ip Address = 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Utils::getCrsHome crsHome /g01/11.2.0/grid
2012-12-21 02:10:07.351: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ocr Context init default level -1677304432
2012-12-21 02:10:07.526: [ default][1333139776]ICMP Ping from 192.168.1.161 to 192.168.1.162
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Trying to plumb  192.168.1.162 on eth0
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp {
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Adding 192.168.1.162 on eth0:1
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp }
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Flushing neighbours ARP Cache
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sSend:  sending type 1
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.533: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:10:08.533: [    AGFW][1333139776] {1:62611:962} Command: start for resource: ora.vrh1.vip 1 1 completed with status: SUCCESS
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [    AGFW][1400256832] {1:62611:962} Agent sending reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: STARTING to: ONLINE
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Started implicit monitor for [ora.vrh1.vip 1 1] interval=1000 delay=1000
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start {
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start }
2012-12-21 02:10:09.530: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv {
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv }
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} Thread:[StartorphanSrvThread:]isRunning is reset to false here
2012-12-21 02:10:10.532: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

 

 

解决由于Network资源过于敏感导致的不必要的vip和listener的方法:

打补丁12378938 该补丁被包含在” 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above”;并修改vip资源的依赖属性,例如:

 

 

# $GRID_HOME/bin/crsctl modify res ora.scan.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net.network)"
For example:

# /ocw/grid/bin/crsctl modify res ora.scan1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan2.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan3.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/srvctl stop scan -f
$ /ocw/grid/bin/srvctl start scan_listener

 

 

 

MORE INFO:

 

 

After check timed out, 11gR2 Grid Infrastructure network resource (usually ora.net1.network) goes to INTERMEDIATE state, then goes back to ONLINE very shortly. This note will not discuss cause of check time out, but most common cause is public network hiccup.

Once network resource goes into INTERMEDIATE state, it may trigger VIP/service to be failed over and listener to go offline due to resource dependence, which could result in unnecessary connectivity issue for that period of time. After network resource is back online, affected resources (listener, service or instance) may not come back online.

$GRID_HOME/log//crsd/crsd.log
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state change for ora.net1.network racnode1 1 [old state = ONLINE, new state = UNKNOWN]
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state LABEL change for ora.net1.network racnode1 1 [old label = , new label = CHECK TIMED OUT]
..
2011-06-12 07:12:31.297: [ CRSPE][12081] {0:1:2881} RI [ora.net1.network racnode1 1] new external state [INTERMEDIATE] old value: [ONLINE] on racnode1 label = [CHECK TIMED OUT]
..
2011-06-12 07:12:31.981: [ AGFW][10796] {0:1:2882} Received state change for ora.net1.network racnode1 1 [old state = UNKNOWN, new state = ONLINE]
..
2011-06-12 07:12:32.307: [ CRSPE][12081] {0:1:2881} RI [ora.LISTENER.lsnr racnode1 1] new internal state: [STOPPING] old value: [STABLE]
2011-06-12 07:12:32.308: [ CRSPE][12081] {0:1:2881} CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr’ on ‘racnode1’
$GRID_HOME/log//agent/crsd/orarootagent_root/orarootagent_root.log
2011-06-12 07:12:08.965: [ AGFW][2070] {1:27767:2} Created alert : (:CRSAGF00113:) : Aborting the command: check for resource: ora.net1.network racnode1 1
2011-06-12 07:12:08.966: [ora.net1.network][2070] {1:27767:2} [check] clsn_agent::abort {
..
2011-06-12 07:12:31.257: [ AGFW][2070] {1:27767:2} Command: check for resource: ora.net1.network racnode1 1 completed with status: TIMEDOUT
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: ONLINE to: UNKNOWN
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 would be continued to monitored!
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state details has changed from: to: CHECK TIMED OUT
..
2011-06-12 07:12:31.923: [ora.net1.network][2314][F-ALGO] {1:27767:2} CHECK initiated by timer for: ora.net1.network racnode1 1
..
2011-06-12 07:12:31.973: [ora.net1.network][8502][F-ALGO] {1:27767:2} [check] Command check for resource: ora.net1.network racnode1 1 completed with status ONLINE
2011-06-12 07:12:31.978: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: UNKNOWN to: ONLINE
$GRID_HOME/log//agent/crsd/oraagent_/oraagent_.log
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Agent received the message: RESOURCE_STOP[ora.LISTENER.lsnr racnode1 1] ID 4099:14792
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Preparing STOP command for: ora.LISTENER.lsnr racnode1 1
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} ora.LISTENER.lsnr racnode1 1 state changed from: ONLINE to: STOPPING

Cause

This is caused by bug 12680491

@ Related issues bug 12378938 bug 11857122 bug 12428737

Solution

The fix of bug 12680491 will add intermediate modifyer to stop dependency between network resource and VIP to avoid unnecessary resource state change, it’s included in 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above. This fix is recommended instead of fix for bug 12378938 to avoid the issue in first place.

Once patch for this bug is applied, the following needs to be executed to change the dependence for all VIPs:

# $GRID_HOME/bin/crsctl modify res ora..vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora..network)”

For example:

# /ocw/grid/bin/crsctl modify res ora.racnode1.vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)”
Once the attribute is changed, a restart of nodeapps/VIP is needed to be in effect

 

11g Real Application Testing:Database Replay使用方法

.  Database Replay使用方法

 

1.1       捕获性能负载

 

1. 针对需要捕获性能负载时段执行如下PL/SQL脚本:

 

   execute dbms_workload_capture.start_capture(‘&CAPTURE_NAME’,’&DIRECT_NAME’,default_action=>’INCLUDE’); 

CAPTURE_NAME=> 本次capture的名字

可以通过 DBA_WORKLOAD_CAPTURES 视图监控

 

DIRECTORY_NAME=> ORACLE目录对象名,该目录用以存放catpure文件,现有测试表明在繁忙的OTLP数据中收集10分钟数据消耗磁盘空间1GB,建议为该目录分配足够的磁盘空间

 

 

 

 

 

 

2. 可以通过如下SQL监控capture的情况

 

查询1:select id,name,status,start_time,end_time,connects,user_calls,dir_path from dba_workload_captures where id = (select max(id) from dba_workload_captures) ; 

set pagesize 0 long 30000000 longchunksize 1000

select dbms_workload_capture.report(&ID,’TEXT’) from dual;

 

其中ID是查询1获得的ID值

 

 

 

3. 当不再需要捕获更多负载时通过如下脚本结束capture:

 

execute dbms_workload_capture.finish_capture; 

 

 

 

 

 

1.2       预处理捕获

1. 将捕获到的capture file传输到目标数据库主机上,并创建必要的Oracle Directory 对象

 

 

 

2.  在目标数据库预处理capture file

 

execute DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(‘&DIRECTORY_NAME’); 

注意 Capture Preprocess By Process_capture Can Not Complete [ID 1265686.1]

Bug 9742032  Database replay: dbms_workload_replay.process_capture takes a lot of time

 

 

— ***********************************************************

—  PROCESS_CAPTURE

—    Processes the workload capture found in capture_dir in place.

—    Analyzes the workload capture found in the capture_dir and

—    creates new workload replay specific metadata files that are

—    required to replay the given workload capture.

—    This procedure can be run multiple times on the same

—    capture directory – useful when this procedure encounters

—    unexpected errors or is cancelled by the user.

—    Once this procedure runs successfully, the capture_dir can be used

—    as input to INITIALIZE_REPLAY() in order to replay the captured

—    workload present in capture_dir.

—    Before a workload capture can be replayed in a particular database

—    version, the capture needs to be “processed” using this

—    PROCESS_CAPTURE procedure in that same database version.

—    Once created, a processed workload capture can be used to replay

—    the captured workload multiple times in the same database version.

—    For example:

—      Say workload “foo” was captured in “rec_dir” in Oracle

—      database version 10.2.0.4

—      In order to replay the workload “foo” in version 11.1.0.1

—      the workload needs to be processed in version 11.1.0.1

—      The following procedure needs to be executed in a 11.1.0.1 database

—      in order to process the capture directory “rec_dir”

—        DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(‘rec_dir’);

—      Now, rec_dir contains a valid 11.1.0.1 processed workload capture

—      that can be used to replay the workload “foo” in 11.1.0.1 databases

—      as many number of times as required.

—    The procedure will take as input the following parameters:

—      capture_dir – name of the workload capture directory object.

—                    (case sensitive)

—                    The directory object must point to a valid OS

—                    directory that has appropriate permissions.

—                    New files will be added to this directory.

—                    (MANDATORY)

—      parallel_level – number of oracle processes used to process the

—                       capture in a parallel fashion.

—                       The NULL default value will auto-compute the

—                       parallelism level, whereas a value of 1 will enforce

—                       serial execution.

 

 

 

3. 以上预处理可能因为bug:8919603

 

 

 

 

1.3       开始REPLAY重放

 

通过 wrc工具的calibrate模式评估需要多少个客户端机

 

 

wrc mode=calibrate replaydir=$REPLAY_DIR$REPLAY_DIR指定预处理过的目录

 

Recommendation:

Consider using at least 75 clients divided among 19 CPU(s)

You will need at least 142 MB of memory per client process.

If your machine(s) cannot match that number, consider using more clients.

 

Workload Characteristics:

– max concurrency: 2830 sessions

– total number of sessions: 70362

 

Assumptions:

– 1 client process per 50 concurrent sessions

– 4 client process per CPU

– 256 KB of memory cache per concurrent session

– think time scale = 100

– connect time scale = 100

– synchronization = TRUE

 

 

准备数据库环境,将数据库闪回到capture时间点并创建用户:

 

shutdown abort;      ==》关闭2个实例startup mount;        ==》 启动一个实例到mountflashback database to restore point prereplay;

alter database open resetlogs;

startup;               ==>启动另一个实例

 

create user orasupport identified by oracle;

grant dba to orasupport;

 

 

 

 

 

执行如下脚本准备capture:

 

 

exec DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY(replay_name =>’&REPLAY_NAME’,replay_dir => ‘&REPLAY_DIR’); 

 

exec DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY(synchronization=>false,

connect_time_scale=>80,think_time_scale=>25, SCALE_UP_MULTIPLIER=>1);

 

 

synchronization=》指定了commit order是否要求一致 , 对于压力测试可以为false,

 

connect_time_scale=》连接时间比例,设置为80%,意为原本要capture 10分钟之后才会登录的session,现在8分钟后就会登录 ,注意设置该值过低会导致大量session登录,可能出现ORA-18/ORA-20错误; 这里设置为80为了加大负载压力

 

think_time_scale=》指2个SQL CALL之间间隔的时间比例,如本来2个SQL之间为100s,设置think_time_scale为25后,其间隔变为25s。

 

 

SCALE_UP_MULTIPLIER=>指定查询的倍数,建议一开始设置为1:1,

 

 

— ***********************************************************

—  PREPARE_REPLAY

—    Puts the DB state in REPLAY mode. The database

—    should have been initialized for replay using

—    DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY(), and optionally any

—    capture time connection strings that require remapping have been

—    already done using DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION().

—    One or more external replay clients (WRC) can be started

—    once the PREPARE_REPLAY procedure has been executed.

—    The procedure will take as input the following parameters:

—      synchronization – Turns synchronization to the given scheme during

—                        workload replay.

—                        When synchronization is SCN, the COMMIT order

—                        observed during the original workload capture

—                        will be preserved during replay.

—                        Every action that is replayed will be executed

—                        ONLY AFTER all of it’s dependent COMMITs (all

—                        COMMITs that were issued before the given action

—                        in the original workload capture) have finished

—                        execution.

—                        When synchronization is OBJECT_ID, a more advanced

—                        synchronization scheme is used.

—                        Every action that is replayed will be executed

—                        ONLY AFTER the RELEVANT COMMITs have finished

—                        executing. The relevant commits are the ones that

—                        were issued before the given action  in the

—                        orginal workload capture and that had modified

—                        at least one of the database objects the given

—                        action is referencing (either implicitely or

—                        explicitely).

—                        This OBJECT_ID scheme has the same logical

—                        property of making sure that any action will see

—                        the same data it saw during capture, but will

—                        allow more concurrency during replays for the

—                        actions that do not touch the same objects/tables.

—                        DEFAULT VALUE: SCN, preserve commit order.

—                        For legacy reason, there is a boolean version of

—                        this procedure:

—                          TRUE  means ‘SCN’

—                          FALSE means ‘OFF’

—      connect_time_scale       – Scales the time elapsed between the

—                                 instant the workload capture was started

—                                 and session connects with the given value.

—                                 The input is interpreted as a % value.

—                                 Can potentially be used to increase or

—                                 decrease the number of concurrent

—                                 users during the workload replay.

—                                 DEFAULT VALUE: 100

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : Capture was started

—                                 12:10 : First session connect  (10m after)

—                                 12:30 : Second session connect (30m after)

—                                 12:42 : Third session connect  (42m after)

—                                 If the connect_time_scale is 50, then the

—                                 session connects will happen as follows:

—                                 12:00 : Replay was started

—                                         with 50% connect time scale

—                                 12:05 : First session connect  ( 5m after)

—                                 12:15 : Second session connect (15m after)

—                                 12:21 : Third session connect  (21m after)

—                                 If the connect_time_scale is 200, then the

—                                 session connects will happen as follows:

—                                 12:00 : Replay was started

—                                         with 200% connect time scale

—                                 12:20 : First session connect  (20m after)

—                                 13:00 : Second session connect (60m after)

—                                 13:24 : Third session connect  (84m after)

—      think_time_scale         – Scales the time elapsed between two

—                                 successive user calls from the same

—                                 session.

—                                 The input is interpreted as a % value.

—                                 Can potentially be used to increase or

—                                 decrease the number of concurrent

—                                 users during the workload replay.

—                                 DEFAULT VALUE: 100

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued

—                                         (10m after completion of prevcall)

—                                 12:14 : First user call completes in 4mins

—                                 12:30 : Second user call issued

—                                         (16m after completion of prevcall)

—                                 12:40 : Second user call completes in 10m

—                                 12:42 : Third user call issued

—                                         ( 2m after completion of prevcall)

—                                 12:50 : Third user call completes in 8m

—                                 If the think_time_scale is 50 during the

—                                 workload replay, then the user calls

—                                 will look something like below:

—                                 12:00 : User SCOTT connects

—                                 12:05 : First user call issued 5 mins

—                                         (50% of 10m) after the completion

—                                         of prev call

—                                 12:10 : First user call completes in 5m

—                                         (takes a minute longer)

—                                 12:18 : Second user call issued 8 mins

—                                         (50% of 16m) after the completion

—                                         of prev call

—                                 12:25 : Second user call completes in 7m

—                                         (takes 3 minutes less)

—                                 12:26 : Third user call issued 1 min

—                                         (50% of 2m) after the completion

—                                         of prev call

—                                 12:35 : Third user call completes in 9m

—                                         (takes a minute longer)

—      think_time_auto_correct  – Auto corrects the think time between calls

—                                 appropriately when user calls takes longer

—                                 time to complete during replay than

—                                 how long the same user call took to

—                                 complete during the original capture.

—                                 DEFAULT VALUE: TRUE, reduce

—                                 think time if replay goes slower

—                                 than capture.

—                                 For example, if the following was observed

—                                 during the original workload capture:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued

—                                         (10m after completion of prevcall)

—                                 12:14 : First user call completes in 4m

—                                 12:30 : Second user call issued

—                                         (16m after completion of prevcall)

—                                 12:40 : Second user call completes in 10m

—                                 12:42 : Third user call issued

—                                         ( 2m after completion of prevcall)

—                                 12:50 : Third user call completes in 8m

—                                 If the think_time_scale is 100 and

—                                 the think_time_auto_correct is TRUE

—                                 during the workload replay, then

—                                 the user calls will look something

—                                 like below:

—                                 12:00 : User SCOTT connects

—                                 12:10 : First user call issued 10 mins

—                                         after the completion of prev call

—                                 12:15 : First user call completes in 5m

—                                         (takes 1 minute longer)

—                                 12:30 : Second user call issued 15 mins

—                                         (16m minus the extra time of 1m

—                                          the prev call took) after the

—                                         completion of prev call

—                                 12:44 : Second user call completes in 14m

—                                         (takes 4 minutes longer)

—                                 12:44 : Third user call issued immediately

—                                         (2m minus the extra time of 4m

—                                          the prev call took) after the

—                                         completion of prev call

—                                 12:52 : Third user call completes in 8m

—      scale_up_multiplier      – Defines the number of times the query workload

—                                 is scaled up during replay. Each captured session

—                                 is replayed concurrently as many times as the

—                                 value of the scale_up_multiplier. However, only

—                                 one of the sessions in each set of identical

—                                 replay sessions executes both queries and updates.

—                                 The remaining sessions only execute queries.

—                                 More specifically note that:

—                                   1. One replay session (base session) of each set

—                                      of identical sessions will replay every call

—                                      from the capture as usual

—                                   2. The remaining sessions (scale-up sessions) will

—                                      only replay calls that are read-only.

—                                      Thus, DDL, DML, and PLSQL calls that

—                                      modified the database will be

—                                      skipped. SELECT FOR UPDATE statements are also skipped.

—                                   3. Read-only calls from the scale-up are

—                                      synchronized appropriately and obey the

—                                      timings defined by think_time_scale, connect_time_scale,

—                                      and think_time_auto_correct. Also the queries

—                                      are made to wait for the appropriate commits.

—                                   4. No replay data or error divergence

—                                      records will be generated for the

—                                      scale-up sessions.

—                                   5. All base or scale-up sessions that

—                                      replay the same capture file will connect

—                                      from the same workload replay client.

—          capture_sts – If this parameter is TRUE, a SQL tuning set

—                        capture is also started in parallel with workload

—                        capture. The resulting SQL tuning set can be

—                        exported using DBMS_WORKLOAD_REPLAY.EXPORT_AWR

—                        along with the AWR data.

—                        Currently, parallel STS capture

—                        is not supported in RAC. So, this parameter has

—                        no effect in RAC.

—                        Furthermore capture filters defined using the

—                        dbms_workload_replay APIs do not apply to the

—                        sql tuning set capture.

—                        The calling user must have the approriate

—                        privileges (‘administer sql tuning set’).

—                        DEFAULT VALUE: FALSE

—     sts_cap_interval – This parameter specifies the capture interval

—                        of the SQL set capture from the cursor cache in

—                        seconds. The default value is 300.

—    Prerequisites:

—      -> The database has been initialized for replay using

—         DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY().

—      -> Any capture time connections strings that require remapping

—         during replay have already been remapped using

—         DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION().

 

 

 

 

 

 

以上完成后启动WRC 客户端:

 

nohup  wrc  orasupport/oracle replaydir=$REPLAY_DIR  DSCN_OFF=TRUE &

 

建议一个INST开75-100个WRC客户端,使用nohup 后台启动

 

 

MODE=REPLAY (default)

———————

 

Keyword     Description

—————————————————————-

USERID      username

PASSWORD    password

SERVER      server connection identifier (Default: empty string)

REPLAYDIR   replay directory (Default:.)

WORKDIR     directory for trace files

DEBUG       ON, OFF (Default: OFF)

CONNECTION_OVERRIDE  TRUE, FALSE (Default: FALSE)

TRUE   All replay threads connect using SERVER,

settings in DBA_WORKLOAD_CONNECTION_MAP will be ignore

FALSE  Use settings from DBA_WORKLOAD_CONNECTION_MAP

SERIALIZE_CONNECTS  TRUE, FALSE (Default: FALSE)

TRUE   All the replay threads will connect to

the database in a serial fashion one after

another. This setting is recommended when

the replay clients use the bequeath protocol

to communicate to the database server.

FALSE  Replay threads will connect to the database

in a concurrent fashion mimicking the original

capture behavior.

DSCN_OFF    TRUE, FALSE (Default: FALSE)

TRUE   Ignore all dependencies due to block

contention during capture when synchronizing

the replay.

FALSE  Honor all captured dependencies.

 

 

MODE=CALIBRATE

————–

Provides an estimate of the number of replay clients needed

 

Keyword     Description

—————————————————————-

REPLAYDIR   replay directory (Default:.)

 

Advanced parameters:

PROCESS_PER_CPU       Maximum number of client process than can be run

per CPU (Default: 4)

THREADS_PER_PROCESS   Maximum number of threads than can be run within

a client process (Default: 50)

 

 

MODE=LIST_HOSTS

—————

Displays all host names involved in the capture

 

Keyword     Description

—————————————————————-

REPLAYDIR   the workload directory (Default:.)

 

 

MODE=GET_TABLES

—————

Lists all objects referenced by captured SQL statements

 

Keyword     Description

—————————————————————-

REPLAYDIR   the workload directory (Default:.)

 

 

 

 

执行下列脚本正式开始REPLAY

 

 

exec DBMS_WORKLOAD_REPLAY.start_replay();

 

 

建议通过EM或者下面的脚本查询进度:

 

Select id, name,status from dba_workload_replays; 

若replay完成可以在EM或者使用如下脚本获得replay报告:

 

 

set pagesize 0 long 30000000 longchunksize 1000

select dbms_workload_replay.report(&ID,’TEXT’) from dual;

 

&ID可以从上面的查询获得

 

 

在AIX 6.1上安装11gR2 RAC

以下在AIX 6.1+Veritas CFS环境上安装11gR2 RAC :

1.  11gR2 GI/RAC安装预备工作

1.1       安装环境:

硬件环境为2节点IBM p系列小型机,每个节点物理CPU数目32个,内存509440 MB,交换空间32768MB操作系统版本为64bit AIX 6.1(6100-06-09-1228)。

 

 

Setting Value
Machine and version AIX macleandb-007 1 6 00F71C964C00
Fully qualified host name macleandb-007
Platform 64-bit AIX
O/S Version 6.1.0.0
O/S Maintenance Level 6100-06
O/S Service Pack Level 6100-06-09-1228
Logged in as oracle
Last run as uid=1101(oracle) gid=1000(oinstall) groups=1000(oinstall),1200(dba),1300(asmdba)
Executed as Oracle home owner? Yes
RDA home directory /oracle/rda
RDA work directory /oracle/rda
Output file prefix RDA
Output file directory /oracle/rda/output
Setup profile DB11g
Perform network pings? No

 

Item Value
Processor(s) 32 Processor(s) Installed
[01]: PowerPC_POWER7 3864 MHz
[02]: PowerPC_POWER7 3864 MHz
[03]: PowerPC_POWER7 3864 MHz
[04]: PowerPC_POWER7 3864 MHz
[05]: PowerPC_POWER7 3864 MHz
[06]: PowerPC_POWER7 3864 MHz
[07]: PowerPC_POWER7 3864 MHz
[08]: PowerPC_POWER7 3864 MHz
[09]: PowerPC_POWER7 3864 MHz
[10]: PowerPC_POWER7 3864 MHz
[11]: PowerPC_POWER7 3864 MHz
[12]: PowerPC_POWER7 3864 MHz
[13]: PowerPC_POWER7 3864 MHz
[14]: PowerPC_POWER7 3864 MHz
[15]: PowerPC_POWER7 3864 MHz
[16]: PowerPC_POWER7 3864 MHz
[17]: PowerPC_POWER7 3864 MHz
[18]: PowerPC_POWER7 3864 MHz
[19]: PowerPC_POWER7 3864 MHz
[20]: PowerPC_POWER7 3864 MHz
[21]: PowerPC_POWER7 3864 MHz
[22]: PowerPC_POWER7 3864 MHz
[23]: PowerPC_POWER7 3864 MHz
[24]: PowerPC_POWER7 3864 MHz
[25]: PowerPC_POWER7 3864 MHz
[26]: PowerPC_POWER7 3864 MHz
[27]: PowerPC_POWER7 3864 MHz
[28]: PowerPC_POWER7 3864 MHz
[29]: PowerPC_POWER7 3864 MHz
[30]: PowerPC_POWER7 3864 MHz
[31]: PowerPC_POWER7 3864 MHz
[32]: PowerPC_POWER7 3864 MHz
Total Physical Memory 509440 MB
Swap: Max Size 32768 MiB
Swap: Percent Used 1%

 

 

1.1       网络环境:

 

对于每台机器,需要配置:

1 个机器固有IP 地址(Public IP)和对应的机器名,在DNS 中注册,或者写到

/etc/hosts 文件中。

1 个虚拟IP 地址(Virtual IP)和对应的服务名,在DNS 中注册,或者写到

/etc/hosts 文件中。(与Public IP 不同,但必须处于同一网段)

1 个互联IP 地址(Private IP)和对应的私有名,写到/etc/hosts 文件中。(这类IP

地址建议采用如: 172.168.*.* 或192.168.*.*网段的地址)

3 个SCAN IP地址,在DNS中注册,不适用DNS的情况下配置1个SCAN IP到/etc/hosts      中

 

Node Interface Name Type Ip Address Registered in
macleandb-007 macleandb-007 Public 192.168.18.227 /etc/hosts
macleandb-008 macleandb-008 Public 192.168.18.231 /etc/hosts
macleandb-007 macleandb-007-vip VIP 192.168.18.226 /etc/hosts
macleandb-008 macleandb-008-vip VIP 192.168.18.230 /etc/hosts
macleandb-008 macleandb-scan1 SCAN IP 192.168.18.224 /etc/hosts
macleandb-007 macleandb-007-priva Private 172.168.1.101 /etc/hosts
macleandb-008 macleandb-008-priva Private 172.168.1.102 /etc/hosts
macleandb-007 macleandb-007-privb Private 172.168.2.101 /etc/hosts
macleandb-008 macleandb-008-privb Private 172.168.2.102 /etc/hosts

 

 

使用FTP测试网络传输速率,2节点传输文件平均速度为50M/s:

 

 

macleandb-007 # scp p10404530_112030_AIX64-5L_1of7.zip  macleandb-008-priva:/oracleThe authenticity of host ‘macleandb-008-priva (172.168.1.102)’ can’t be established.RSA key fingerprint is b1:2e:4f:40:7f:46:39:2d:ae:21:86:6f:75:bb:92:32.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added ‘macleandb-008-priva,172.168.1.102’ (RSA) to the list of known hosts.p10404530_112030_AIX64-5L_1of7.zip                                                                                                     20%  343MB  47.3MB/s   00:28 ETAKilled by signal 2. 

 

macleandb-007 # scp p10404530_112030_AIX64-5L_1of7.zip  macleandb-008-privb:/oracle

The authenticity of host ‘macleandb-008-privb (172.168.2.102)’ can’t be established.

RSA key fingerprint is b1:2e:4f:40:7f:46:39:2d:ae:21:86:6f:75:bb:92:32.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘macleandb-008-privb,172.168.2.102’ (RSA) to the list of known hosts.

p10404530_112030_AIX64-5L_1of7.zip                                                                                                     11%  202MB  51.2MB/s   00:28 ETAKilled by signal 2.

macleandb-007 #

 

macleandb-007 # scp p10404530_112030_AIX64-5L_1of7.zip  macleandb-008:/oracle

p10404530_112030_AIX64-5L_1of7.zip                                                                                                     17%  299MB  50.6MB/s   00:27 ETAKilled by signal 2.

 

 

 

1.1       缺省网关:

在每台机器上都要对于公网接口定义缺省网关。

定义网关是为了跨网段的客户端和应用服务器可以访问数据库服务器;如果操作系统已经

配置了静态或动态路由,此处无需再配置网关。

 

 

1.2       时钟同步:

11.2中使用GI自带的CTSSD节点间时间同步服务,不开启NTPD时间同步守护进程。

 

NTP is not required for 11gR2 since we have the Cluster Time Synchronization Service (CTSD), but if youare using NTP you need to use it with â..-xâ.. option:Checking: ps -ef |grep ntpsIf it has no -x option do below steps:

  • · a. Open the /etc/rc.tcpip file, and locate the following line: start /usr/sbin/xntpd “$src_running”
  • · b. Change the line to the following: start /usr/sbin/xntpd “$src_running” “-x”
  • · c. Save the file.

 

 

 

 

 

1.3       用户和组:

创建必要的grid和oracle用户,以及oinstall、dba等组,保持在2个节点间的UID、GID一致

 

macleandb-007 # mkgroup -‘A’ id=’1000′ adms=’root’ oinstallmacleandb-007 # mkgroup -‘A’ id=’1100′ adms=’root’ asmadminmacleandb-007 # mkgroup -‘A’ id=’1200′ adms=’root’ dbamacleandb-007 # mkgroup -‘A’ id=’1300′ adms=’root’ asmdbamacleandb-007 # mkgroup -‘A’ id=’1301′ adms=’root’ asmopermacleandb-007 # mkuser id=’1100′ pgrp=’oinstall’ groups=’asmadmin,asmdba,asmoper’ home=’/home/grid’ gridmacleandb-007 # mkuser id=’1101′ pgrp=’oinstall’ groups=’dba,asmdba’ home=’/home/oracle’ oracle

macleandb-007 # mkdir -p /oracle/app/product/11.2.0/db_1

macleandb-007 # mkdir -p /oracle/app/product/grid

macleandb-007 # chown oracle:oinstall /oracle/app/product/11.2.0/db_1

macleandb-007 # chown grid:oinstall /oracle/app/product/grid

macleandb-007 # chmod 755 /oracle/app/product/11.2.0/db_1

macleandb-007 # chmod 755 /oracle/app/product/grid

 

 

macleandb-008 # mkgroup -‘A’ id=’1000′ adms=’root’ oinstall

macleandb-008 # mkgroup -‘A’ id=’1100′ adms=’root’ asmadmin

macleandb-008 # mkgroup -‘A’ id=’1200′ adms=’root’ dba

macleandb-008 # mkgroup -‘A’ id=’1300′ adms=’root’ asmdba

macleandb-008 # mkgroup -‘A’ id=’1301′ adms=’root’ asmoper

macleandb-008 # mkuser id=’1100′ pgrp=’oinstall’ groups=’asmadmin,asmdba,asmoper’ home=’/home/grid’ grid

macleandb-008 #  mkuser id=’1101′ pgrp=’oinstall’ groups=’dba,asmdba’ home=’/home/oracle’ oracle

macleandb-008 #  mkdir -p /oracle/app/product/11.2.0/db_1

macleandb-008 # mkdir -p /oracle/app/product/grid

macleandb-008 # chown oracle:oinstall /oracle/app/product/11.2.0/db_1

macleandb-008 # chown grid:oinstall /oracle/app/product/grid

macleandb-008 # chmod 755 /oracle/app/product/11.2.0/db_1

macleandb-008 # chmod 755 /oracle/app/product/grid

 

 

 

 

 

 

 

macleandb-007 # id oracle

uid=1101(oracle) gid=1000(oinstall) groups=1200(dba),1300(asmdba)

 

macleandb-007 # id grid

uid=1100(grid) gid=1000(oinstall) groups=1100(asmadmin),1300(asmdba),1301(asmoper)

 

macleandb-008 # id oracle

uid=1101(oracle) gid=1000(oinstall) groups=1200(dba),1300(asmdba)

 

macleandb-008 # id grid

uid=1100(grid) gid=1000(oinstall) groups=1100(asmadmin),1300(asmdba),1301(asmoper)

 

 

 

赋予grid用户CAP_NUMA_ATTACH等权限:

 

 

macleandb-008 # lsuser -a capabilities gridgridmacleandb-008 # chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE gridmacleandb-008 # lsuser -a capabilities gridgrid capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE

 

 

 

1.1       用户profile:

 

在profile文件中为Oracle和Grid用户配置必要的环境变量参数:

 

Grid profileumask 022export AIXTHREAD_SCOPE=Sexport ORACLE_HOME=/oracle/app/product/grid

export ORACLE_BASE=/oracle/app

export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$PATH

export CRS_HOME=$ORACLE_HOME

export LIBPATH=$ORACLE_HOME/lib

 

 

 

oracle profile

 

umask 022

 

export AIXTHREAD_SCOPE=S

export ORACLE_HOME=/oracle/app/product/11.2.0/db_1

export ORACLE_BASE=/oracle/app

export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$PATH

export LIBPATH=$ORACLE_HOME/lib

export ORA_NLS10=$ORACLE_HOME/nls/data

export NLS_LANG=”Simplified Chinese”_China.AL32UTF8

 

 

 

 

1.1       配置oracle和grid用户的shell limit参数:

 

可以通过smitty chuser或者 修改/etc/security/limits配置文件配置用户的shell limit参数:

macleandb-007 # cat /etc/security/limitsoracle:fsize = -1data = -1stack = -1

core = -1

rss = -1

nofiles = -1

 

grid:

fsize = -1

data = -1

stack = -1

core = -1

rss = -1

nofiles = -1

 

$ ulimit -a

time(seconds)        unlimited

file(blocks)         unlimited

data(kbytes)         unlimited

stack(kbytes)        4194304

memory(kbytes)       unlimited

coredump(blocks)     unlimited

nofiles(descriptors) unlimited

threads(per process) unlimited

processes(per user)  unlimited

 

macleandb-008 # cat /etc/security/limits

 

oracle:

fsize = -1

data = -1

stack = -1

core = -1

rss = -1

nofiles = -1

 

grid:

fsize = -1

data = -1

stack = -1

core = -1

rss = -1

nofiles = -1

 

 

$ ulimit -a

time(seconds)        unlimited

file(blocks)         unlimited

data(kbytes)         unlimited

stack(kbytes)        4194304

memory(kbytes)       unlimited

coredump(blocks)     unlimited

nofiles(descriptors) unlimited

threads(per process) unlimited

processes(per user)  unlimited

 

 

 

1.2       配置oracle和grid用户的等价性

 

在macleandb-007、macleandb-008 2节点之间配置oracle和grid用户的等价性。

 

$ rm -rf $HOME/.ssh$ ./sshUserSetup.sh -user grid -hosts “macleandb-007 macleandb-008”  -advanced PromptPassphrase 

The output of this script is also logged into /tmp/sshUserSetup_2012-12-05-01-27-09.log

Hosts are macleandb-007 macleandb-008

user is grid

Platform:- AIX

Checking if the remote hosts are reachable

PING macleandb-007: (192.168.18.227): 56 data bytes

64 bytes from 192.168.18.227: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=1 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=2 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=3 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=4 ttl=255 time=0 ms

 

—-macleandb-007 PING Statistics—-

5 packets transmitted, 5 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

PING macleandb-008: (192.168.18.231): 56 data bytes

64 bytes from 192.168.18.231: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=1 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=2 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=3 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=4 ttl=255 time=0 ms

 

—-macleandb-008 PING Statistics—-

5 packets transmitted, 5 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

Remote host reachability check succeeded.

The following hosts are reachable: macleandb-007 macleandb-008.

The following hosts are not reachable: .

All hosts are reachable. Proceeding further…

firsthost macleandb-007

numhosts 2

The script will setup SSH connectivity from the host macleandb-007 to all

the remote hosts. After the script is executed, the user can use SSH to run

commands on the remote hosts or copy files between this host macleandb-007

and the remote hosts without being prompted for passwords or confirmations.

 

NOTE 1:

As part of the setup procedure, this script will use ssh and scp to copy

files between the local host and the remote hosts. Since the script does not

store passwords, you may be prompted for the passwords during the execution of

the script whenever ssh or scp is invoked.

 

NOTE 2:

AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY

AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE

directories.

 

Do you want to continue and let the script make the above mentioned changes (yes/no)?

yes

 

The user chose yes

Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type ‘yes’ or ‘no’ and then press enter. In case you press ‘yes’, you would need to enter the passphrase whenever the script executes ssh or scp.

The estimated number of times the user would be prompted for a passphrase is 4. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion.

Enter ‘yes’ or ‘no’.

yes

 

The user chose yes

Creating .ssh directory on local host, if not present already

Creating authorized_keys file on local host

Changing permissions on authorized_keys to 644 on local host

Creating known_hosts file on local host

Changing permissions on known_hosts to 644 on local host

Creating config file on local host

If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.

Removing old private/public keys on local host

Running SSH keygen on local host

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Generating public/private rsa key pair.

Your identification has been saved in /home/grid/.ssh/id_rsa.

Your public key has been saved in /home/grid/.ssh/id_rsa.pub.

The key fingerprint is:

a6:87:38:26:11:7a:1e:d6:4a:eb:2b:6d:54:68:bc:29 grid@macleandb-007

The key’s randomart image is:

+–[ RSA 1024]—-+

|                 |

|                 |

| …             |

| .+o.            |

|..*+.   S        |

|E=+= . +         |

| += + o .        |

|..oo . .         |

| oo.             |

+—————–+

Creating .ssh directory and setting permissions on remote host macleandb-007

THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.

The script would create /home/grid/.ssh/config file on remote host macleandb-007. If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.

The user may be prompted for a password here since the script would be running SSH on host macleandb-007.

Warning: Permanently added ‘macleandb-007,192.168.18.227’ (RSA) to the list of known hosts.

grid@macleandb-007’s password:

Done with creating .ssh directory and setting permissions on remote host macleandb-007.

Creating .ssh directory and setting permissions on remote host macleandb-008

THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.

The script would create /home/grid/.ssh/config file on remote host macleandb-008. If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.

The user may be prompted for a password here since the script would be running SSH on host macleandb-008.

Warning: Permanently added ‘macleandb-008,192.168.18.231’ (RSA) to the list of known hosts.

grid@macleandb-008’s password:

Done with creating .ssh directory and setting permissions on remote host macleandb-008.

Copying local host public key to the remote host macleandb-007

The user may be prompted for a password or passphrase here since the script would be using SCP for host macleandb-007.

grid@macleandb-007’s password:

Done copying local host public key to the remote host macleandb-007

Copying local host public key to the remote host macleandb-008

The user may be prompted for a password or passphrase here since the script would be using SCP for host macleandb-008.

grid@macleandb-008’s password:

Done copying local host public key to the remote host macleandb-008

Creating keys on remote host macleandb-007 if they do not exist already. This is required to setup SSH on host macleandb-007.

 

Creating keys on remote host macleandb-008 if they do not exist already. This is required to setup SSH on host macleandb-008.

Generating public/private rsa key pair.

Your identification has been saved in .ssh/id_rsa.

Your public key has been saved in .ssh/id_rsa.pub.

The key fingerprint is:

57:10:a3:c8:c6:ec:55:3f:1c:ea:c3:c5:ef:bf:23:b6 grid@macleandb-008

The key’s randomart image is:

+–[ RSA 1024]—-+

|          =..    |

|     + . o B .   |

|      * o . B    |

|     o . o o o   |

|      . S =   .  |

|         . . .   |

|              .  |

|             o o |

|            .Eo.+|

+—————–+

Updating authorized_keys file on remote host macleandb-007

Updating known_hosts file on remote host macleandb-007

The script will run SSH on the remote machine macleandb-007. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Updating authorized_keys file on remote host macleandb-008

Updating known_hosts file on remote host macleandb-008

The script will run SSH on the remote machine macleandb-008. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

cat: 0652-050 Cannot open /home/grid/.ssh/known_hosts.tmp.

cat: 0652-050 Cannot open /home/grid/.ssh/authorized_keys.tmp.

SSH setup is complete.

 

————————————————————————

Verifying SSH setup

===================

The script will now run the date command on the remote nodes using ssh

to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,

THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR

PASSWORDS. If you see any output other than date or are prompted for the

password, ssh is not setup correctly and you will need to resolve the

issue and set up ssh again.

The possible causes for failure could be:

1. The server settings in /etc/ssh/sshd_config file do not allow ssh

for user grid.

2. The server may have disabled public key based authentication.

3. The client public key on the server may be outdated.

4. /home/grid or /home/grid/.ssh on the remote host may not be owned by grid.

5. User may not have passed -shared option for shared remote users or

may be passing the -shared option for non-shared remote users.

6. If there is output in addition to the date, but no password is asked,

it may be a security alert shown as part of company policy. Append the

additional text to the <OMS HOME>/sysman/prov/resources/ignoreMessages.txt file.

————————————————————————

–macleandb-007:–

Running /usr/bin/ssh -x -l grid macleandb-007 date to verify SSH connectivity has been setup from local host to macleandb-007.

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.

The script will run SSH on the remote machine macleandb-007. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Wed Dec  5 01:28:15 GMT+08:00 2012

————————————————————————

–macleandb-008:–

Running /usr/bin/ssh -x -l grid macleandb-008 date to verify SSH connectivity has been setup from local host to macleandb-008.

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.

The script will run SSH on the remote machine macleandb-008. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Wed Dec  5 01:28:20 GMT+08:00 2012

————————————————————————

————————————————————————

Verifying SSH connectivity has been setup from macleandb-007 to macleandb-007

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.

Wed Dec  5 01:28:16 GMT+08:00 2012

————————————————————————

————————————————————————

Verifying SSH connectivity has been setup from macleandb-007 to macleandb-008

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.

Wed Dec  5 01:28:21 GMT+08:00 2012

————————————————————————

-Verification from complete-

SSH verification complete.

 

 

 

 

 

 

$ ./sshUserSetup.sh -user oracle -hosts “macleandb-007 macleandb-008”  -advanced PromptPassphrase

The output of this script is also logged into /tmp/sshUserSetup_2012-12-05-01-31-18.log

Hosts are macleandb-007 macleandb-008

user is oracle

Platform:- AIX

Checking if the remote hosts are reachable

PING macleandb-007: (192.168.18.227): 56 data bytes

64 bytes from 192.168.18.227: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=1 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=2 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=3 ttl=255 time=0 ms

64 bytes from 192.168.18.227: icmp_seq=4 ttl=255 time=0 ms

 

—-macleandb-007 PING Statistics—-

5 packets transmitted, 5 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

PING macleandb-008: (192.168.18.231): 56 data bytes

64 bytes from 192.168.18.231: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=1 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=2 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=3 ttl=255 time=0 ms

64 bytes from 192.168.18.231: icmp_seq=4 ttl=255 time=0 ms

 

—-macleandb-008 PING Statistics—-

5 packets transmitted, 5 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

Remote host reachability check succeeded.

The following hosts are reachable: macleandb-007 macleandb-008.

The following hosts are not reachable: .

All hosts are reachable. Proceeding further…

firsthost macleandb-007

numhosts 2

The script will setup SSH connectivity from the host macleandb-007 to all

the remote hosts. After the script is executed, the user can use SSH to run

commands on the remote hosts or copy files between this host macleandb-007

and the remote hosts without being prompted for passwords or confirmations.

 

NOTE 1:

As part of the setup procedure, this script will use ssh and scp to copy

files between the local host and the remote hosts. Since the script does not

store passwords, you may be prompted for the passwords during the execution of

the script whenever ssh or scp is invoked.

 

NOTE 2:

AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY

AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE

directories.

 

Do you want to continue and let the script make the above mentioned changes (yes/no)?

yes

 

The user chose yes

Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type ‘yes’ or ‘no’ and then press enter. In case you press ‘yes’, you would need to enter the passphrase whenever the script executes ssh or scp.

The estimated number of times the user would be prompted for a passphrase is 4. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion.

Enter ‘yes’ or ‘no’.

yes

 

The user chose yes

Creating .ssh directory on local host, if not present already

Creating authorized_keys file on local host

Changing permissions on authorized_keys to 644 on local host

Creating known_hosts file on local host

Changing permissions on known_hosts to 644 on local host

Creating config file on local host

If a config file exists already at /home/oracle/.ssh/config, it would be backed up to /home/oracle/.ssh/config.backup.

Removing old private/public keys on local host

Running SSH keygen on local host

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Generating public/private rsa key pair.

Your identification has been saved in /home/oracle/.ssh/id_rsa.

Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.

The key fingerprint is:

de:e1:bd:ac:7d:f0:25:68:3c:80:33:24:53:ae:31:fb oracle@macleandb-007

The key’s randomart image is:

+–[ RSA 1024]—-+

|       ..        |

|      o..        |

|      o+..       |

|       =+ .      |

|      o So.o .   |

|       o o o* . .|

|        E o..+ o |

|           o .o  |

|          ..+.   |

+—————–+

Creating .ssh directory and setting permissions on remote host macleandb-007

THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR oracle. THIS IS AN SSH REQUIREMENT.

The script would create /home/oracle/.ssh/config file on remote host macleandb-007. If a config file exists already at /home/oracle/.ssh/config, it would be backed up to /home/oracle/.ssh/config.backup.

The user may be prompted for a password here since the script would be running SSH on host macleandb-007.

Warning: Permanently added ‘macleandb-007,192.168.18.227’ (RSA) to the list of known hosts.

oracle@macleandb-007’s password:

Done with creating .ssh directory and setting permissions on remote host macleandb-007.

Creating .ssh directory and setting permissions on remote host macleandb-008

THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR oracle. THIS IS AN SSH REQUIREMENT.

The script would create /home/oracle/.ssh/config file on remote host macleandb-008. If a config file exists already at /home/oracle/.ssh/config, it would be backed up to /home/oracle/.ssh/config.backup.

The user may be prompted for a password here since the script would be running SSH on host macleandb-008.

Warning: Permanently added ‘macleandb-008,192.168.18.231’ (RSA) to the list of known hosts.

oracle@macleandb-008’s password:

Done with creating .ssh directory and setting permissions on remote host macleandb-008.

Copying local host public key to the remote host macleandb-007

The user may be prompted for a password or passphrase here since the script would be using SCP for host macleandb-007.

oracle@macleandb-007’s password:

Done copying local host public key to the remote host macleandb-007

Copying local host public key to the remote host macleandb-008

The user may be prompted for a password or passphrase here since the script would be using SCP for host macleandb-008.

oracle@macleandb-008’s password:

Done copying local host public key to the remote host macleandb-008

Creating keys on remote host macleandb-007 if they do not exist already. This is required to setup SSH on host macleandb-007.

 

Creating keys on remote host macleandb-008 if they do not exist already. This is required to setup SSH on host macleandb-008.

Generating public/private rsa key pair.

Your identification has been saved in .ssh/id_rsa.

Your public key has been saved in .ssh/id_rsa.pub.

The key fingerprint is:

b2:b1:d7:2d:2a:bf:d9:76:5c:2a:6e:97:d9:7d:7c:3e oracle@macleandb-008

The key’s randomart image is:

+–[ RSA 1024]—-+

|                 |

|                 |

|                 |

|                 |

|      o S        |

|       = . .  .  |

|      o . o..* o |

|      .. ++.B .E=|

|       o==o+   .=|

+—————–+

Updating authorized_keys file on remote host macleandb-007

Updating known_hosts file on remote host macleandb-007

The script will run SSH on the remote machine macleandb-007. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Updating authorized_keys file on remote host macleandb-008

Updating known_hosts file on remote host macleandb-008

The script will run SSH on the remote machine macleandb-008. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

cat: 0652-050 Cannot open /home/oracle/.ssh/known_hosts.tmp.

cat: 0652-050 Cannot open /home/oracle/.ssh/authorized_keys.tmp.

SSH setup is complete.

 

————————————————————————

Verifying SSH setup

===================

The script will now run the date command on the remote nodes using ssh

to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,

THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR

PASSWORDS. If you see any output other than date or are prompted for the

password, ssh is not setup correctly and you will need to resolve the

issue and set up ssh again.

The possible causes for failure could be:

1. The server settings in /etc/ssh/sshd_config file do not allow ssh

for user oracle.

2. The server may have disabled public key based authentication.

3. The client public key on the server may be outdated.

4. /home/oracle or /home/oracle/.ssh on the remote host may not be owned by oracle.

5. User may not have passed -shared option for shared remote users or

may be passing the -shared option for non-shared remote users.

6. If there is output in addition to the date, but no password is asked,

it may be a security alert shown as part of company policy. Append the

additional text to the <OMS HOME>/sysman/prov/resources/ignoreMessages.txt file.

————————————————————————

–macleandb-007:–

Running /usr/bin/ssh -x -l oracle macleandb-007 date to verify SSH connectivity has been setup from local host to macleandb-007.

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.

The script will run SSH on the remote machine macleandb-007. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Wed Dec  5 01:31:50 GMT+08:00 2012

————————————————————————

–macleandb-008:–

Running /usr/bin/ssh -x -l oracle macleandb-008 date to verify SSH connectivity has been setup from local host to macleandb-008.

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.

The script will run SSH on the remote machine macleandb-008. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.

Wed Dec  5 01:31:55 GMT+08:00 2012

————————————————————————

————————————————————————

Verifying SSH connectivity has been setup from macleandb-007 to macleandb-007

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.

Wed Dec  5 01:31:51 GMT+08:00 2012

————————————————————————

————————————————————————

Verifying SSH connectivity has been setup from macleandb-007 to macleandb-008

IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.

Wed Dec  5 01:31:57 GMT+08:00 2012

————————————————————————

-Verification from complete-

SSH verification complete.

 

 

1.3       OS参数优化:

 

Oracle推荐在RAC环境中设置tcp_ephemeral_low、tcp_ephemeral_high、udp_ephemeral_low、udp_ephemeral_high等网络参数,以便优化网络传输性能,加强心跳网络可用性:

 

macleandb-007 # chdev -l sys0 -a maxuproc=16384sys0 changed 

macleandb-007 # lsattr -E -l sys0 -a maxuproc

maxuproc 16384 Maximum number of PROCESSES allowed per user True

 

macleandb-007 # lsattr -El sys0 | grep maxuproc | awk ‘{print $2}’

16384

 

macleandb-007 # lsattr -El sys0 | grep ncargs | awk ‘{print $2}’

256

 

 

 

 

macleandb-008 #  chdev -l sys0 -a maxuproc=16384

sys0 changed

 

macleandb-008 #  lsattr -E -l sys0 -a maxuproc

maxuproc 16384 Maximum number of PROCESSES allowed per user True

 

macleandb-008 #  lsattr -El sys0 | grep maxuproc | awk ‘{print $2}’

16384

 

 

macleandb-008 #  lsattr -El sys0 | grep ncargs | awk ‘{print $2}’

256

 

 

macleandb-007 # /usr/sbin/no -p -o tcp_ephemeral_low=9000 -o tcp_ephemeral_high=65500

Setting tcp_ephemeral_low to 9000

Setting tcp_ephemeral_low to 9000 in nextboot file

Setting tcp_ephemeral_high to 65500

Setting tcp_ephemeral_high to 65500 in nextboot file

macleandb-007 # /usr/sbin/no -p -o udp_ephemeral_low=9000 -o udp_ephemeral_high=65500

Setting udp_ephemeral_low to 9000

Setting udp_ephemeral_low to 9000 in nextboot file

Setting udp_ephemeral_high to 65500

Setting udp_ephemeral_high to 65500 in nextboot file

 

 

macleandb-008 # /usr/sbin/no -p -o tcp_ephemeral_low=9000 -o tcp_ephemeral_high=65500

Setting tcp_ephemeral_low to 9000

Setting tcp_ephemeral_low to 9000 in nextboot file

Setting tcp_ephemeral_high to 65500

Setting tcp_ephemeral_high to 65500 in nextboot file

macleandb-008 #  /usr/sbin/no -p -o udp_ephemeral_low=9000 -o udp_ephemeral_high=65500

Setting udp_ephemeral_low to 9000

Setting udp_ephemeral_low to 9000 in nextboot file

Setting udp_ephemeral_high to 65500

Setting udp_ephemeral_high to 65500 in nextboot file

 

 

no -po tcp_sendspace=65536

no -po tcp_recvspace=65536

no -po sb_max=41943040

no -po udp_sendspace=2097152

no -po udp_recvspace=20971520

no -r -o ipqmaxlen=512

 

macleandb-007 # no -po rfc1323=1

Setting rfc1323 to 1

Setting rfc1323 to 1 in nextboot file

Change to tunable rfc1323, will only be effective for future connections

 

 

macleandb-008 # no -a |grep “rfc1323”

rfc1323 = 1

 

 

macleandb-008 # ioo -o aio_maxreqs

aio_maxreqs = 65536

 

 

 

 

 

 

 

1.4       存储环境:

由于使用Veritas Cluster Filesystem,故无需配置ASM。为GI的ocr和votedisk创建专用的共享文件系统上的文件目录:

 

macleandb-007 # chown grid:oinstall /oraclemacleandb-008 # chown grid:oinstall /oraclemacleandb-007 # chown oracle:oinstall /oradata/*macleandb-007 # cd /oradata

 

macleandb-007 # su – grid

 

$ cd oradata01

$ mkdir ocr

$ cd ..

$ mkdir oradata02/ocr

$ mkdir oradata03/ocr

 

 

$ mkdir oradata01/vote

$ mkdir oradata02/vote

$ mkdir oradata03/vote

 

 

 

 

 

1.5       使用cluvfy、rda工具验证安装环境:

 

Cluster Verification Utilit(CVU)是Oracle所推荐的一种集群检验工具。该检验工具帮助用户在Cluter部署的各个阶段验证集群的重要组件,这些阶段包括硬件搭建、Clusterware的安装、RDBMS的安装、存储等等。我们既可以在Cluster安装之前使用CVU来帮我们检验所配置的环境正确可用,也可以在软件安装完成后使用CVU来做对集群的验收。

 

CVU提供了一种可扩展的框架,其所实施的常规检验活动是独立于具体的平台,并且向存储和网络的检验提供了厂商接口(Vendor Interface)。
CVU工具不依赖于其他Oracle软件,仅使用命令cluvfy,如cluvfy stage -pre crsinst -n vrh1,vrh2。

 

cluvfy的部署十分简单,在本地节点安装后,该工具在运行过程中会自动部署到远程主机上。具体的自动部署流程如下:

 

  1. 用户在本地节点安装CVU
  2. 用户针对多个节点实施Verify检验命令
  3. CVU工具将拷贝自身必要的文件到远程节点
  4. CVU会在所有节点执行检验任务并生成报告

 

 

 

CVU工具可以为我们提供以下功能:

 

  1. 验证Cluster集群是否规范配置以便后续的RAC安装、配置和操作顺利
  2. 全类型的验证
  3. 非破坏性的验证
  4. 提供了易于使用的接口
  5. 支持各种平台和配置的RAC,明确完善的统一行为方式

 

注意不要误解cluvfy的作用,它仅仅是一个检验者,而不负责实际的配置或修复工作:

cluvfy不支持任何类型的cluster或RAC操作

在检验到问题或失败后,cluvfy不会采取任何修正行为

cluvfy不是性能调优或监控工具

cluvfy不会尝试帮助你验证RAC数据库的内部结构

RAC的实际部署可以被逻辑地区分为多个操作阶段,这些阶段被称作是”stage”,在实际的部署过程中每一个stage由一系列的操作组成。每一个stage的都有自身的预检查(pre-check)和验收检查(post-check),如图:

 

 

 

$ cluvfy stage -post hwos -n macleandb-007,macleandb-008Performing post-checks for hardware and operating system setupChecking node reachability…

Node reachability check passed from node “macleandb-007”

 

 

Checking user equivalence…

User equivalence check passed for user “oracle”

 

Checking node connectivity…

 

Checking hosts config file…

 

Verification of the hosts config file successful

 

Node connectivity passed for subnet “172.168.1.0” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “172.168.1.0”

 

Node connectivity passed for subnet “172.168.2.0” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “172.168.2.0”

 

Node connectivity passed for subnet “192.168.18.128” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “192.168.18.128”

 

 

Interfaces found on subnet “172.168.1.0” that are likely candidates for VIP are:

macleandb-008 en0:172.168.1.102 en0:172.168.1.102

macleandb-007 en0:172.168.1.101 en0:172.168.1.101

 

Interfaces found on subnet “172.168.2.0” that are likely candidates for VIP are:

macleandb-008 en1:172.168.2.102 en1:172.168.2.102

macleandb-007 en1:172.168.2.101 en1:172.168.2.101

 

Interfaces found on subnet “192.168.18.128” that are likely candidates for VIP are:

macleandb-008 en5:192.168.18.231 en5:192.168.18.231

macleandb-007 en5:192.168.18.227 en5:192.168.18.227 en5:192.168.18.227

 

WARNING:

Could not find a suitable set of interfaces for the private interconnect

Checking subnet mask consistency…

Subnet mask consistency check passed for subnet “172.168.1.0”.

Subnet mask consistency check passed for subnet “172.168.2.0”.

Subnet mask consistency check passed for subnet “192.168.18.128”.

Subnet mask consistency check passed.

 

Node connectivity check passed

 

Checking multicast communication…

 

Checking subnet “172.168.1.0” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “172.168.1.0” for multicast communication with multicast group “230.0.1.0” passed.

 

Checking subnet “172.168.2.0” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “172.168.2.0” for multicast communication with multicast group “230.0.1.0” passed.

 

Checking subnet “192.168.18.128” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “192.168.18.128” for multicast communication with multicast group “230.0.1.0” passed.

 

Check of multicast communication passed.

Check for multiple users with UID value 0 passed

Time zone consistency check passed

 

Checking shared storage accessibility…

 

Disk                                  Sharing Nodes (2 in count)

————————————  ————————

/dev/rhdisk2                          macleandb-008 macleandb-007

/dev/rhdisk3                          macleandb-008 macleandb-007

/dev/rhdisk4                          macleandb-008 macleandb-007

/dev/rhdisk5                          macleandb-008 macleandb-007

/dev/rhdisk6                          macleandb-008 macleandb-007

 

 

 

$ cluvfy stage -pre crsinst -n macleandb-007,macleandb-008

 

Performing pre-checks for cluster services setup

 

Checking node reachability…

Node reachability check passed from node “macleandb-007”

 

 

Checking user equivalence…

User equivalence check passed for user “oracle”

 

Checking node connectivity…

 

Checking hosts config file…

 

Verification of the hosts config file successful

 

Node connectivity passed for subnet “172.168.1.0” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “172.168.1.0”

 

Node connectivity passed for subnet “172.168.2.0” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “172.168.2.0”

 

Node connectivity passed for subnet “192.168.18.128” with node(s) macleandb-008,macleandb-007

TCP connectivity check passed for subnet “192.168.18.128”

 

 

Interfaces found on subnet “172.168.1.0” that are likely candidates for VIP are:

macleandb-008 en0:172.168.1.102 en0:172.168.1.102

macleandb-007 en0:172.168.1.101 en0:172.168.1.101

 

Interfaces found on subnet “172.168.2.0” that are likely candidates for VIP are:

macleandb-008 en1:172.168.2.102 en1:172.168.2.102

macleandb-007 en1:172.168.2.101 en1:172.168.2.101

 

Interfaces found on subnet “192.168.18.128” that are likely candidates for VIP are:

macleandb-008 en5:192.168.18.231 en5:192.168.18.231

macleandb-007 en5:192.168.18.227 en5:192.168.18.227 en5:192.168.18.227

 

WARNING:

Could not find a suitable set of interfaces for the private interconnect

Checking subnet mask consistency…

Subnet mask consistency check passed for subnet “172.168.1.0”.

Subnet mask consistency check passed for subnet “172.168.2.0”.

Subnet mask consistency check passed for subnet “192.168.18.128”.

Subnet mask consistency check passed.

 

Node connectivity check passed

 

Checking multicast communication…

 

Checking subnet “172.168.1.0” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “172.168.1.0” for multicast communication with multicast group “230.0.1.0” passed.

 

Checking subnet “172.168.2.0” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “172.168.2.0” for multicast communication with multicast group “230.0.1.0” passed.

 

Checking subnet “192.168.18.128” for multicast communication with multicast group “230.0.1.0”…

Check of subnet “192.168.18.128” for multicast communication with multicast group “230.0.1.0” passed.

 

Check of multicast communication passed.

Total memory check passed

Available memory check passed

Swap space check passed

Free disk space check passed for “macleandb-008:/oracle/app/product/grid”

Free disk space check passed for “macleandb-007:/oracle/app/product/grid”

Free disk space check failed for “macleandb-008:/tmp/”

Check failed on nodes:

macleandb-008

Free disk space check passed for “macleandb-007:/tmp/”

Check for multiple users with UID value 1101 passed

User existence check passed for “oracle”

Group existence check passed for “oinstall”

Group existence check passed for “dba”

Membership check for user “oracle” in group “oinstall” [as Primary] passed

Membership check for user “oracle” in group “dba” passed

Run level check passed

Hard limits check passed for “maximum open file descriptors”

Soft limits check passed for “maximum open file descriptors”

Hard limits check passed for “maximum user processes”

Soft limits check passed for “maximum user processes”

System architecture check passed

Kernel version check passed

Kernel parameter check passed for “ncargs”

Kernel parameter check passed for “maxuproc”

Kernel parameter check passed for “tcp_ephemeral_low”

Kernel parameter check passed for “tcp_ephemeral_high”

Kernel parameter check passed for “udp_ephemeral_low”

Kernel parameter check passed for “udp_ephemeral_high”

Package existence check passed for “bos.adt.base”

Package existence check passed for “bos.adt.lib”

Package existence check passed for “bos.adt.libm”

Package existence check passed for “bos.perf.libperfstat”

Package existence check passed for “bos.perf.perfstat”

Package existence check passed for “bos.perf.proctools”

Package existence check passed for “xlC.aix61.rte”

Package existence check passed for “xlC.rte”

Operating system patch check failed for “Patch IZ97457”

Check failed on nodes:

macleandb-008,macleandb-007

Operating system patch check failed for “Patch IZ89165”

Check failed on nodes:

macleandb-008,macleandb-007

Check for multiple users with UID value 0 passed

Current group ID check passed

 

Starting check for consistency of primary group of root user

 

Check for consistency of root user’s primary group passed

 

Starting Clock synchronization checks using Network Time Protocol(NTP)…

 

NTP Configuration file check started…

NTP Configuration file check passed

No NTP Daemons or Services were found to be running

PRVF-5507 : NTP daemon or service is not running on any node but NTP configuration file exists on the following node(s):

macleandb-008,macleandb-007

Clock synchronization check using Network Time Protocol(NTP) failed

 

Core file name pattern consistency check passed.

 

User “oracle” is not part of “system” group. Check passed

Default user file creation mask check passed

Checking consistency of file “/etc/resolv.conf” across nodes

 

File “/etc/resolv.conf” does not exist on any node of the cluster. Skipping further checks

 

File “/etc/resolv.conf” is consistent across nodes

 

Time zone consistency check passed

User ID < 65535 check passed

 

Kernel 64-bit mode check passed

 

 

Pre-check for cluster services setup was unsuccessful on all the nodes.

 

 

 

 

Test “Oracle Database 11g R2 (11.2.0) Preinstall (AIX)” executed at 05-Dec-2012 01:40:28

 

Test Results

~~~~~~~~~~~~

 

 

 

ID     NAME                 RESULT  VALUE

====== ==================== ======= ==========================================

A00010 OS Certified?        PASSED  Certified

A00015 HARDWARE_BITMODE 64? PASSED  is 64-bit

A00016 System mode 64-bit?  FAILED  bootinfo error

A00020 User in /etc/passwd? PASSED  userOK

A00040 Group in /etc/group? PASSED  GroupOK

A00050 Enter ORACLE_HOME    RECORD

A00060 ORACLE_HOME Valid?   FAILED  OHnotvalid

A00070 O_H Permissions OK?  FAILED  ORACLE_HOME must exist for this rule

A00080 oraInventory Permiss PASSED  oraInventoryNotFound

A00090 Got Software Tools?  PASSED  ld_nm_ar_make_found

A00100 Umask Set to 022?    PASSED  UmaskOK

A00110 ulimits OK?          FAILED  StackTooSmall

A00140 LDLIBRARYPATH Unset? PASSED  UnSet

A00160 LIBPATH Unset?       PASSED  UnSet

A00190 Enter JDK Home       RECORD

A00200 JDK Version          FAILED  JDK home is missing

A00210 Other O_Hs in PATH?  PASSED  NoneFound

A00220 Other OUI Up?        PASSED  NoOtherOUI

A00230 /tmp Adequate?       PASSED  TempSpaceOK

A00240 Disk Space OK?       SKIPPED This rule needs an accessible ORACLE_..>

A00250 Swap (in MB)         RECORD  512

A00260 RAM (in MB)          PASSED  509440

A00270 SwapToRAM OK?        FAILED  SwapLessThanRAM

A00290 IP Address           RECORD  192.168.18.227

A00300 Domain Name          RECORD  NotFound

A00310 DNS Lookup           FAILED  Host not correctly registered in DNS

A00320 /etc/hosts Format    FAILED  no entry found

A00330 Kernel Parameters OK FAILED  NoAccess

A00335 aio_maxreqs OK?      FAILED  ioo error

A00340 AIXTHREAD_SCOPE=S?   PASSED  AIXTHREAD_SCOPEOK

A00350 LINK_CNTRL is Unset? PASSED  LINK_CNTRLunset

A00410 Got OS Patches?      PASSED  PatchesNotRequired

A00430 Got OS Packages?     FAILED  [gpfs.base] not installed

A00445 TCP/UDP Ephemeral OK PASSED  Ephemeral OK

 

以上检测出现fail的环节可以忽略,安装环境已成熟。

 

 

11gR2 GI/RAC正式安装

 

1. 解压安装介质并运行rootpre.sh脚本:

 

拷贝一个UNZIP过来macleandb-007 # /tmp/unzip   p10404530_112030_AIX64-5L_1of7.zipmacleandb-007 # /tmp/unzip p10404530_112030_AIX64-5L_2of7.zip

macleandb-007 # /tmp/unzip p10404530_112030_AIX64-5L_3of7.zip

 

 

 

 

macleandb-007 # ./rootpre.sh

./rootpre.sh output will be logged in /tmp/rootpre.out_12-12-05.01:14:00

 

Checking if group services should be configured….

Nothing to configure.

 

 

 

2. 执行./runInstaller 启动GI安装

 

step1 选择skil software updates

 

 

step 2选择 安装为cluster

 

step3 选择 Advanced Installation

 

Step4 加入简体中文语言

 

 

step5 填入合适的cluster name和scan name,使用默认端口1521

 

step 6 验证SSH等价性

 

 

step7 指定en0、en1为private network, en5为public network

 

 

step8 指定shared file system为存储类型

 

step9 指定ocr位置

 

step10 指定votedisk位置,后续步骤点击Next即可

 

 

 

在2个节点上执行必要的orainstRoot.sh和root.sh脚本:

 

macleandb-007 #  /home/grid/app/oraInventory/orainstRoot.shChanging permissions of /home/grid/app/oraInventory.Adding read,write permissions for group.Removing read,write,execute permissions for world.

 

Changing groupname of /home/grid/app/oraInventory to oinstall.

The execution of the script is complete.

macleandb-007 # /oracle/app/product/grid/root.sh

Performing root user operation for Oracle 11g

 

The following environment variables are set as:

ORACLE_OWNER= grid

ORACLE_HOME=  /oracle/app/product/grid

 

Enter the full pathname of the local bin directory: [/usr/local/bin]:

Creating /usr/local/bin directory…

Copying dbhome to /usr/local/bin …

Copying oraenv to /usr/local/bin …

Copying coraenv to /usr/local/bin …

 

 

Creating /etc/oratab file…

Entries will be added to the /etc/oratab file as needed by

Database Configuration Assistant when a database is created

Finished running generic part of root script.

Now product-specific root actions will be performed.

Using configuration parameter file: /oracle/app/product/grid/crs/install/crsconfig_params

Creating trace directory

User ignored Prerequisites during installation

User grid has the required capabilities to run CSSD in realtime mode

OLR initialization – successful

root wallet

root wallet cert

root cert export

peer wallet

profile reader wallet

pa wallet

peer wallet keys

pa wallet keys

peer cert request

pa cert request

 

peer cert

pa cert

peer root cert TP

profile reader root cert TP

pa root cert TP

peer pa cert TP

pa peer cert TP

profile reader pa cert TP

profile reader peer cert TP

peer user cert

pa user cert

Adding Clusterware entries to inittab

 

 

CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘macleandb-007’

CRS-2676: Start of ‘ora.mdnsd’ on ‘macleandb-007’ succeeded

CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘macleandb-007’

CRS-2676: Start of ‘ora.gpnpd’ on ‘macleandb-007’ succeeded

CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘macleandb-007’

CRS-2672: Attempting to start ‘ora.gipcd’ on ‘macleandb-007’

CRS-2676: Start of ‘ora.cssdmonitor’ on ‘macleandb-007’ succeeded

CRS-2676: Start of ‘ora.gipcd’ on ‘macleandb-007’ succeeded

CRS-2672: Attempting to start ‘ora.cssd’ on ‘macleandb-007’

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘macleandb-007’

CRS-2676: Start of ‘ora.diskmon’ on ‘macleandb-007’ succeeded

CRS-2676: Start of ‘ora.cssd’ on ‘macleandb-007’ succeeded

clscfg: -install mode specified

Successfully accumulated necessary OCR keys.

Creating OCR keys for user ‘root’, privgrp ‘system’..

Operation successful.

Now formatting voting disk: /oradata/oradata01/vote/vote1.

Now formatting voting disk: /oradata/oradata02/vote/vote2.

Now formatting voting disk: /oradata/oradata03/vote/vote3.

CRS-4603: Successful addition of voting disk /oradata/oradata01/vote/vote1.

CRS-4603: Successful addition of voting disk /oradata/oradata02/vote/vote2.

CRS-4603: Successful addition of voting disk /oradata/oradata03/vote/vote3.

##  STATE    File Universal Id                File Name Disk group

—  —–    —————–                ——— ———

1. ONLINE   08a3980556c04f05bf2286c0ad1a0924 (/oradata/oradata01/vote/vote1) []

2. ONLINE   21d4c2b231be4fdcbf4d44a3f19455f2 (/oradata/oradata02/vote/vote2) []

3. ONLINE   03619a54a99f4f9fbfc6b22cef4c31da (/oradata/oradata03/vote/vote3) []

Located 3 voting disk(s).

 

 

 

macleandb-008 # /home/grid/app/oraInventory/orainstRoot.sh

Changing permissions of /home/grid/app/oraInventory.

Adding read,write permissions for group.

Removing read,write,execute permissions for world.

 

Changing groupname of /home/grid/app/oraInventory to oinstall.

The execution of the script is complete.

 

 

 

 

 

macleandb-008 # /oracle/app/product/grid/root.sh

Performing root user operation for Oracle 11g

 

The following environment variables are set as:

ORACLE_OWNER= grid

ORACLE_HOME=  /oracle/app/product/grid

 

Enter the full pathname of the local bin directory: [/usr/local/bin]:

Creating /usr/local/bin directory…

Copying dbhome to /usr/local/bin …

Copying oraenv to /usr/local/bin …

Copying coraenv to /usr/local/bin …

 

 

Creating /etc/oratab file…

Entries will be added to the /etc/oratab file as needed by

Database Configuration Assistant when a database is created

Finished running generic part of root script.

Now product-specific root actions will be performed.

Using configuration parameter file: /oracle/app/product/grid/crs/install/crsconfig_params

Creating trace directory

User ignored Prerequisites during installation

 

User grid has the required capabilities to run CSSD in realtime mode

OLR initialization – successful

Adding Clusterware entries to inittab

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node unknown, number unknown, and is terminating

An active cluster was found during exclusive startup, restarting to join the cluster

Configure Oracle Grid Infrastructure for a Cluster … succeeded

macleandb-008 #

 

 

 

执行完成上述脚本后在OUI图形界面上点击Finish完成安装,并验证安装结果:

 

$ crsctl query crs softwareversion

Oracle Clusterware version on node [macleandb-007] is [11.2.0.3.0]

 

 

$ crsctl query crs activeversion

Oracle Clusterware active version on the cluster is [11.2.0.3.0]

 

 

$ crsctl stat res -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

——————————————————————————–

Local Resources

——————————————————————————–

ora.LISTENER.lsnr

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.asm

OFFLINE OFFLINE      macleandb-007               Instance Shutdown

OFFLINE OFFLINE      macleandb-008

ora.gsd

OFFLINE OFFLINE      macleandb-007

OFFLINE OFFLINE      macleandb-008

ora.net1.network

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.ons

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.registry.acfs

OFFLINE OFFLINE      macleandb-007

OFFLINE OFFLINE      macleandb-008

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.LISTENER_SCAN1.lsnr

1        ONLINE  ONLINE       macleandb-007

ora.macleandb-007.vip

1        ONLINE  ONLINE       macleandb-007

ora.macleandb-008.vip

1        ONLINE  ONLINE       macleandb-008

ora.cvu

1        ONLINE  ONLINE       macleandb-007

ora.oc4j

1        ONLINE  ONLINE       macleandb-007

ora.scan1.vip

1        ONLINE  ONLINE       macleandb-007

 

 

 

 

$ crsctl stat res -t -init

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.asm

1        OFFLINE OFFLINE                               Instance Shutdown

ora.cluster_interconnect.haip

1        ONLINE  ONLINE       macleandb-007

ora.crf

1        ONLINE  ONLINE       macleandb-007

ora.crsd

1        ONLINE  ONLINE       macleandb-007

ora.cssd

1        ONLINE  ONLINE       macleandb-007

ora.cssdmonitor

1        ONLINE  ONLINE       macleandb-007

ora.ctssd

1        ONLINE  ONLINE       macleandb-007               OBSERVER

ora.diskmon

1        OFFLINE OFFLINE

ora.drivers.acfs

1        ONLINE  ONLINE       macleandb-007

ora.evmd

1        ONLINE  ONLINE       macleandb-007

ora.gipcd

1        ONLINE  ONLINE       macleandb-007

ora.gpnpd

1        ONLINE  ONLINE       macleandb-007

ora.mdnsd

1        ONLINE  ONLINE       macleandb-007

 

 

通过上述命令验证确认Grid Infrastructure Cluster版本、状态良好。

 

3. 进一步安装RDBMS DB Software,运行./runInstaller启动OUI界面

 

 

step1 点击NEXT

 

Step2 选择skip software updates

Step3 选择Install database software only

 

Step4 选择RAC安装macleandb-007、macleandb-008 2个节点

 

Step5 选择添加简体中文语音包

Step6 选择安装企业版

 

Step7 选择安装RDBMS的ORACLE_HOME

Step8 选择dba 为OSDBA组

Step9 复查安装先决条件,以上Warning可以忽略,点击Ignore ALL并NEXT

Step10 Summary页面点击Install

 

Step 11 安装过程完成后,用root用户在2个节点分别运行$ORACLE_HOME/root.sh,脚本执行完成后RDBMS DB软件安装完毕。

 

 

 

4. 使用DBCA创建测试用数据库

 

切换至oracle用户并执行dbca工具:

Step0 点选RAC database
Step1 点选Create Database
Step2 点选General Purpose
Step3 填入DB NAME并加入macleandb-007、macleandb-008 2个节点。
Step 4 不配置Enterprise Manager,并启用Automatic maintenance Tasks
Step5 输入sysdba密码
Step6 选择使用OMF,并使用共享文件系统目录存放数据文件
Step7 指定Fast Recovery Area快速恢复区域目录,及大小
Step8 勾选Sample Schemas,装载示例模式用以测试性能和可用性
Step9 选择合适的SGA、PGA内存配置、processes参数和数据库字符集。
Step10 指定redolog文件的大小,过小与的redolog不利于性能,这里创建512MB一个的online redolog。
Step11 勾选Create Database点击FINISH正式创建数据库

DBCA完成create database工作后,检查创建的测试数据库状态:

 

macleandb-007 # su – grid

$ crsctl stat res -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

——————————————————————————–

Local Resources

——————————————————————————–

ora.MAC_LSN.lsnr

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.LISTENER.lsnr

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.asm

OFFLINE OFFLINE      macleandb-007               Instance Shutdown

OFFLINE OFFLINE      macleandb-008

ora.gsd

OFFLINE OFFLINE      macleandb-007

OFFLINE OFFLINE      macleandb-008

ora.net1.network

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.ons

ONLINE  ONLINE       macleandb-007

ONLINE  ONLINE       macleandb-008

ora.registry.acfs

OFFLINE OFFLINE      macleandb-007

OFFLINE OFFLINE      macleandb-008

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.LISTENER_SCAN1.lsnr

1        ONLINE  ONLINE       macleandb-007

ora.macleandb-007.vip

1        ONLINE  ONLINE       macleandb-007

ora.macleandb-008.vip

1        ONLINE  ONLINE       macleandb-008

ora.cvu

1        ONLINE  ONLINE       macleandb-007

ora.oc4j

1        ONLINE  ONLINE       macleandb-007

ora.scan1.vip

1        ONLINE  ONLINE       macleandb-007

ora.testdb.db

1        ONLINE  ONLINE       macleandb-007               Open

2        ONLINE  ONLINE       macleandb-008               Open

 

 

macleandb-007 # su  –  oracle

 

 

$ srvctl stop database -d testdb

$ srvctl start database -d testdb

 

 

上述命令结果显示测试用TESTDB状态良好、可用。

CRS-5008: Invalid attribute value

在用oifcfg修改一个11.2.0.2  RAC的public interface的时候遇到了该错误,该错误常见的原因是:

1. BUG 12559708 BUG 11077013 BUG 12888041 BUG 12870719 bug 12876147 bug 13425057 bug 13738734 bug 13846043
2. 用oifcfg 修改public 后未为使用 svrctl modify nodeapps -n <nodename> -A 90.224.207.150/255.255.255.224/nxge0\|nxge2 修改network服务资源属性
3. 子网掩码搞错,如果mask是 255.255.255.128,则网段结尾要求也是128

常见的诊断步骤是观察:

1.crsctl status resource  ora.net1.network -p
2.  $GRID_HOME/log/<hostname>/agent/crsd/orarootagent_root/orarootagent_root.log
3.考虑reboot node可能可以解决,最主要的仍是保证网段和掩码正确

 

常见错误信息为 orarootagent_root.log中出现:

2011-08-15 18:43:44.105: [ora.net1.network][21] {1:1729:2} [check] subnetmask=0.0.0.0
2011-08-15 18:43:44.105: [ora.net1.network][21] {1:1729:2} [check] subnetnumber=0.0.0.0
2011-08-15 18:43:44.105: [ AGENT][21] {1:1729:2} UserErrorException: Locale is
2011-08-15 18:43:44.106: [ora.net1.network][21] {1:1729:2} [check] CRS-5008: Invalid attribute value: lan900 for the network interface

 

11gR2新特性:LMHB Lock Manager Heart Beat后台进程

LMHB是11gR2中新引入的后台进程,官方文档的介绍是Global Cache/Enqueue Service Heartbeat Monitor,Monitor the heartbeat of LMON, LMD, and LMSn processes,LMHB monitors LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning。  Database and ASM instances, Oracle RAC

该进程负责监控LMON、LMD、LMSn等RAC关键的后台进程,保证这些background process不被阻塞或spin。 LMHB可能是Lock Manager Heartbeat的缩写。

 

我们来看一下该进程的trace跟踪文件以便了解其功能:

按照 100s -> 80s -> 100s -> 80s的间隔监控并输出一次LMSn、LCKn、LMON、LMD等进程的状态及wait chain,由kjfmGCR_HBCheckAll函数控制

 

*** 2012-02-03 00:03:10.066
==============================
LMS0 (ospid: 17247) has not moved for 77 sec (1328245389.1328245312)
kjfmGCR_HBCheckAll: LMS0 (ospid: 17247) has status 2
  : waiting for event 'gcs remote message' for 0 secs with wait_id 15327.
  ===[ Wait Chain ]===
  Wait chain is empty.
kjgcr_Main: KJGCR_ACTION - id 5

*** 2012-02-03 00:04:50.091
==============================
LMS0 (ospid: 17247) has not moved for 88 sec (1328245489.1328245401)
kjfmGCR_HBCheckAll: LMS0 (ospid: 17247) has status 2
  : waiting for event 'gcs remote message' for 0 secs with wait_id 24546.
  ===[ Wait Chain ]===
  Wait chain is empty.
kjgcr_Main: KJGCR_ACTION - id 5

 LCK0 (ospid: 2662) has not moved for 95 sec (1309746735.1309746640)
  kjfmGCR_HBCheckAll: LCK0 (ospid: 2662) has status 6
  ==================================================
  === LCK0 (ospid: 2662) Heartbeat Report
  ==================================================
  LCK0 (ospid: 2662) has no heartbeats for 95 sec. (threshold 70 sec)
   : Not in wait; last wait ended 80 secs ago.
   : last wait_id 2317342 at 'libcache interrupt action by LCK'.
  ..
  .
   Session Wait History:
       elapsed time of 1 min 20 sec since last wait
    0: waited for 'libcache interrupt action by LCK'
  ..

 

大约每3分钟输出一次TOP CPU User,CPU使用率高的session信息:

 

*** 2012-02-03 00:05:30.102
kjgcr_SlaveReqBegin: message queued to slave
kjgcr_Main: KJGCR_ACTION - id 3
CPU is high.  Top oracle users listed below:
     Session           Serial         CPU
      29                 7             0
     156                23             0
       3                 1             0
       4                 1             0
       5                 1             0

*** 2012-02-03 00:08:30.147
kjgcr_SlaveReqBegin: message queued to slave
kjgcr_Main: KJGCR_ACTION - id 3
CPU is high.  Top oracle users listed below:
     Session           Serial         CPU
      29                 7             0
     156                23             0
       3                 1             0
       4                 1             0
       5                 1             0

 

如果发现有session的CPU使用率极高,根据内部算法可能会激活 资源计划(resource management plan) ,甚至于kill 进程:

 

*** 2012-02-03 00:08:35.149
kjgcr_Main: Reset called for action high cpu, identify users, count 0

*** 2012-02-03 00:08:35.149
kjgcr_Main: Reset called for action high cpu, kill users, count 0

*** 2012-02-03 00:08:35.149
kjgcr_Main: Reset called for action high cpu, activate RM plan, count 0

*** 2012-02-03 00:08:35.149
kjgcr_Main: Reset called for action high cpu, set BG into RT, count 0

 

从11.2.0.2 开始LMHB开始使用slave 进程GCRn来完成实际的任务(Global Conflict Resolution Slave Process Performs synchronous tasks on behalf of LMHB GCRn processes are transient slaves that are started and stopped as required by LMHB to perform synchronous or resource intensive tasks.) LMHB会控制GCRn进程的启停,以便使用多个GCRn完成同步和缓解资源紧张的任务(例如kill进程)。

可以看到实际LMHB调用的多为kjgcr或kjfmGCR开头的内部函数,GCR意为Global Conflict Resolution。

 

kjgcr_Main: KJGCR_ACTION – id 5

GCR 进程的trace :

*** 2011-11-28 02:42:44.466
kjgcr_SlaveActionCbk: Callback failed, check trace
Dumping GCR slave work message at 0x96b81fc0
GCR layer information: type = 1, index = 0
Unformatted dump of ksv layer header:

 

LMHB进程的出现是为了提高RAC的可用性,特别是在资源紧张的环境中他会主动地去尝试kill掉最耗费资源的服务进程,以保证LMS等关键的RAC后台进程能正常工作; 因为该进程定期监控LMS、LMON等后台进程的等待事件及session的CPU使用率等信息,所以该LMHB进程的跟踪日志也可能成为诊断RAC故障的之一,这是11.2.0.1以来RAC一个潜在的新特性和增强。

相关隐式参数

_lm_hb_callstack_collect_time hb diagnostic call stack collection time in seconds — 5s
_lm_hb_disable_check_list list of process names to be disabled in heartbeat check — none

 

11.2是第一个引入LMHB进程的版本,所以并不是太成熟,在实际过程中对于资源使用率很高的RAC系统而言LMHB可能会帮一些倒忙,若你确实遇到了相关的问题或者是在11.2 RAC上碰到了一些诡异的现象,那么可以关注一下以下这些MOS Note:

 

ORA-29770 LMHB Terminates Instance as LMON Waited for Control File IO or LIBRARY CACHE or ROW CACHE Event for too Long [ID 1197674.1]
Bug 8888434 – LMHB crashes the instance with LMON waiting on controlfile read [ID 8888434.8]
Bug 11890804 – LMHB crashes instance with ORA-29770 after long “control file sequential read” waits [ID 11890804.8]
Bug 11890804: LMHB TERMINATE INSTANCE WHEN LMON WAIT CHANGE FROM CF READ AFTER 60 SEC
Bug 13467673: CSS MISSCOUNT AND ALL ASM DOWN WITH ORA-29770 BY LMHB
Bug 13390052: KJFMGCR_HBCHECKALL MESSAGES ARE CONTINUOUSLY LOGGED IN LMHB TRACE FILE.
Bug 13322797: LMHB TERMINATES THE INSTANCE DUE TO ERROR 29770
Bug 11827088 – Latch ‘gc element’ contention, LMHB terminates the instance [ID 11827088.8]

Bug 13061883: LMHB IS TERMINATING THE INSTANCE DURING SHUTDOWN IMMEDIATE
Bug 12564133 – ORA-600[1433] in LMHB process during RAC reconfiguration [ID 12564133.8]
Bug 12886605: ESSC: LMHB TERMINATE INSTANCE DUE TO 29770 – LMON WAIT ENQ: AM – DISK OFFLINE
Bug 12757321: LMHB TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10296263: LMHB (OSPID: 15872): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 11899415: ORA-29771 AND LMHB (OSPID: XXXX) KILLS USER (OSPID: XXX
Bug 10431752: SINGLE NODE RAC: LMHB TERMINATES INSTANCE DUE TO 29770
Bug 11656856: LMHB (OSPID: 27701): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10411143: INSTANCE CRASHES WITH IPC SEND TIMEOUT AND LMHB TERMINATES WITH ORA-29770
Bug 11704041: DATABASE INSTANCE CRASH BY LMHB PROCESS
Bug 10412545: ORA-29770 LMHB TERMINATE INSTANCE DUE TO VARIOUS LONG CSS WAIT
Bug 10147827: INSTANCE TERMINATED BY LMHB WITH ERROR ORA-29770
Bug 10016974: ORA-29770 LMD IS HUNG FOR MORE THAN 70 SECONDS AND LMHB TERMINATE INSTANCE
Bug 9376100: LMHB TERMINATING INSTANCE DUE ERROR 29770

 

 

给11gR2 RAC添加LISTENER监听器并静态注册

之前有同学想要给11gR2的RAC添加LISTENER监听器,查看了listener.ora并发现问题:

 

[oracle@vrh2 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 04-DEC-2011 02:51:40

Copyright (c) 1991, 2011, Oracle. All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 02-DEC-2011 05:40:09
Uptime 1 days 21 hr. 11 min. 31 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File /g01/orabase/diag/tnslsnr/vrh2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.163)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.164)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "VPROD" has 1 instance(s).
Instance "VPROD2", status READY, has 1 handler(s) for this service...
Service "VPRODXDB" has 1 instance(s).
Instance "VPROD2", status READY, has 1 handler(s) for this service...
The command completed successfully

[oracle@vrh2 ~]$ cat /g01/11.2.0/grid/network/admin/listener.ora

LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LSN_MACLEAN=ON # line added by Agent

 

以上listener.ora配置文件中的信息是Grid Infrastructure安装过程中Agent自行添加的(During the Grid Infrastructure installation, the (default) node VIP listener is always created referencing the public network),比较难以理解的可能是LISTENER仅指定了PROTOCOL=IPC的信息, 而没有指定监听的地址、端口等信息。

 

实际上11.2 GI的LISTENER 监听器配置默认受到11.2新引入的endpoints_listener.ora配置文件的管理:

 

Listener.ora

[grid@netrac1 admin]$ more listener.ora
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))))# line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent

ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent

The ENABLE_GLOBAL_DYNAMIC_ENDPOINT_ parameter is set to allow the listener to accept connections
for pre-11.2 databases which did not register the dynamic endpoint.

Listener status "listener" showing 1 instance registered, ie instance running on the node

[grid@netrac1 admin]$ lsnrctl status listener
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 15-FEB-2011 10:57:09
Uptime 0 days 0 hr. 0 min. 46 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/netrac1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=12.345.678.111)(PORT=1521))) ** Node IP Address **
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=12.345.678.888)(PORT=1521))) ** Node VIP Address **
Services Summary...
Service "v11gr2" has 1 instance(s).
Instance "v11gr21", status READY, has 2 handler(s) for this service...
The command completed successfully

New file for 11.2 called endpoints_listener.ora, showing the Node IP address and Node VIP address.

[grid@netrac1 admin]$ more endpoints_listener.ora
LISTENER_NETRAC1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=netrac1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=12.345.678.888)(PORT=1521)(IP=FIRST)))) # line added by Agent

Endpoints_listener.ora file is there for backward compatibility with pre-11.2 databases.
DBCA needs to know the endpoints location to configure database parameters and tnsnames.ora file.
It used to use the listener.ora file, 11.2 RAC listener.ora by default only has IPC entries.

"Line added by Agent" is the Oraagent is the process updating the listener.ora and endpoints_listener.ora files.
Endpoints_listener.ora showing the Node IP address and Node VIP address

[grid@netrac2 admin]$ more endpoints_listener.ora
LISTENER_NETRAC2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=netrac2-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=12.345.678.999) (PORT=1521)(IP=FIRST)))) # line added by Agent

 

我一开始以为LISTENER默认监听的地址和端口被写到了OCR中,后来用ocrdump转储注册信息发现没有相关记录。 后来才发现原来11.2 GI中监听器的地址和端口信息被移到了 endpoints_listener.ora中, “Line added by Agent”说明是由Oraagent 进程更新的记录。

 

注意:使用 endpoints_listener.ora的情况 下不应使用lsnrctl管理LISTENER,而需使用srvctl或crsctl工具管理,否则lsnrctl将不会识别endpoints_listener.ora中的信息,造成监听没有在必要地址、端口上工作。如:

 

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:26

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                26-NOV-2011 08:33:14
Uptime                    1 days 2 hr. 12 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1588)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1588)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status READY, has 1 handler(s) for this service...
Service "VPRODXDB" has 1 instance(s).
  Instance "VPROD1", status READY, has 1 handler(s) for this service...
The command completed successfully

[grid@vrh1 admin]$ lsnrctl reload LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:39

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
The command completed successfully

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:44

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                26-NOV-2011 08:33:14
Uptime                    1 days 2 hr. 12 min. 30 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
The listener supports no services
The command completed successfully

[grid@vrh1 admin]$ srvctl stop listener -l LSN_MACLEAN

[grid@vrh1 admin]$ srvctl start listener -l LSN_MACLEAN  

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:46:26

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 10:46:22
Uptime                    0 days 0 hr. 0 min. 4 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1588)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1588)))
The listener supports no services
The command completed successfully

 

而在11.2 RAC中listener.ora仅记录LISTENER的IPC条目。这样做的目的是方便dbca配置数据库参数及tnsnames.ora配置文件。

了解到以上信息后可能你对当前11.2 RAC中的listener.ora文件中的监听配置信息不再感到奇怪。

 

 

我们可以使用netca图形化工具或者srvctl 命令行工具添加监听配置; 如果仅仅是手动在listener.ora中添加记录的话是无法被注册为Cluster Ready Service的服务的,将不会被CRS管理。

 

方法1:

使用netca和netmgr图形化工具,完成添加监听和静态注册的工作。

1) 以Grid Infrastructure GI用户登录任意节点,并运行netca启动图形界面:

su - grid
(grid)$ export DISPLAY=:0
(grid)$ netca

选择LISTENER Configuration

 

 

选择ADD

填入监听名字

 

 

选择subnet和availabe protocol ,一般默认即可,除非你有多个public network网段

 

填入端口号

选择NO

 

 

选择要启动的监听名,即方才你创建的监听名

之后选择FINISH退出netca 界面,启动netmgr界面,为监听加入静态注册的信息:

su - grid
(grid)$ export DISPLAY=:0
(grid)$ netmgr

 

点选方才创建的监听器,选择Database Services菜单

 

 

填入Global Database Name和本地实例的SID信息,并确认ORACLE HOME Directory(应是Grid Infrastructure的Home目录)正确后点选Save Network Configuration。

 

之后使用srvctl 或 crsctl 重启该监听即可生效:

 

[grid@vrh1 admin]$ crsctl status  res ora.MACLEAN_LISTENER.lsnr
NAME=ora.MACLEAN_LISTENER.lsnr
TYPE=ora.listener.type
TARGET=ONLINE        , ONLINE
STATE=ONLINE on vrh1, ONLINE on vrh2

[grid@vrh1 admin]$ crsctl stop  res ora.MACLEAN_LISTENER.lsnr
CRS-2673: Attempting to stop 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2'
CRS-2677: Stop of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2' succeeded

[grid@vrh1 admin]$ crsctl start  res ora.MACLEAN_LISTENER.lsnr
CRS-2672: Attempting to start 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2'
CRS-2672: Attempting to start 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1'
CRS-2676: Start of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1' succeeded
CRS-2676: Start of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2' succeeded

[grid@vrh1 admin]$ lsnrctl status MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:00:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:00:11
Uptime                    0 days 0 hr. 0 min. 31 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1598)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1598)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

[grid@vrh1 admin]$ srvctl stop listener -l MACLEAN_LISTENER

[grid@vrh1 admin]$ srvctl start listener -l MACLEAN_LISTENER  

[grid@vrh1 admin]$ srvctl config listener -l MACLEAN_LISTENER
Name: MACLEAN_LISTENER
Network: 1, Owner: grid
Home:
End points: TCP:1598

[grid@vrh1 admin]$ lsnrctl status MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:01:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:01:10
Uptime                    0 days 0 hr. 0 min. 31 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1598)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1598)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

以上使用netca和netmgr图形界面工具完成了新监听的添加和静态注册工作。

 

2. 使用srvctl 工具添加监听并手动加入静态注册信息

 

检查默认network的network number,红色的数字

[grid@vrh1 admin]$ srvctl config network
Network exists: 1/192.168.1.0/255.255.255.0/eth0, type static

srvctl 添加监听的语法如下

[grid@vrh1 admin]$  srvctl add listener -h

Adds a listener configuration to the Oracle Clusterware.

Usage: srvctl add listener [-l <lsnr_name>] [-s] [-p "[TCP:]<port>[, ...][/IPC:<key>]
[/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"] [-o <oracle_home>] [-k <net_num>]
    -l <lsnr_name>           Listener name (default name is LISTENER)
    -o <oracle_home>         ORACLE_HOME path (default value is CRS_HOME)
    -k <net_num>             network number (default number is 1)
    -s                       Skip the checking of ports
    -p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"      
Comma separated tcp ports or listener endpoints
    -h                       Print usage

[grid@vrh1 admin]$  srvctl add listener -l NEW_MACLEAN_LISTENER -o $CRS_HOME -p 1601 -k 1

-k 填入方才获得的network number,-p填入端口号,-l填入监听名,-o 填入GI HOME路径

[grid@vrh1 admin]$ srvctl start listener -l NEW_MACLEAN_LISTENER

 

srvctl start listener启动新添加的监听后listener.ora和endpoints_listener.ora会出现新的记录:

 

[grid@vrh1 admin]$ head -1 listener.ora
NEW_MACLEAN_LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=NEW_MACLEAN_LISTENER))))           
# line added by Agent

[grid@vrh1 admin]$ head -1 endpoints_listener.ora
NEW_MACLEAN_LISTENER_VRH1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=vrh1-vip)(PORT=1601))
(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.161)(PORT=1601)(IP=FIRST))))            
# line added by Agent

 

以上已经完成了监听的添加,足见使用srvctl管理更为简便。

 

之后仅需要加入静态注册信息即可,如:

 

SID_LIST_NEW_MACLEAN_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = VPROD)
      (ORACLE_HOME = /g01/11.2.0/grid)
      (SID_NAME = VPROD1)
    )
  )

 

加入如上信息到listener.ora配置文件中(SID_LIST_($LISTENER_NAME),并重启监听即完成静态注册:

 

[grid@vrh1 admin]$ srvctl stop listener -l NEW_MACLEAN_LISTENER

[grid@vrh1 admin]$ srvctl start listener -l NEW_MACLEAN_LISTENER  

[grid@vrh1 admin]$ lsnrctl status NEW_MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:21:37

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=NEW_MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     NEW_MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:21:25
Uptime                    0 days 0 hr. 0 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/11.2.0/grid/log/diag/tnslsnr/vrh1/new_maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=NEW_MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1601)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1601)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

以上利用srvctl管理工具完成了添加新监听和静态注册的任务。

11g direct path read介绍:10949 event、_small_table_threshold与_serial_direct_read

在11g之前串行的扫描大表默认总是先将数据读取到Oracle高速缓冲中,其等待事件常为db file scattered read。

 

从11g开始Oracle通过内部算法来决定串行扫描大表是通过直接路径读direct path read,还是先读入到buffer cache中,此算法依据表的大小评估。

 

_small_table_threshold 隐藏参数指定了 ORACLE中大表的阀值,其单位为block,即大于_small_table_threshold 所指定的块数的表被视作大表,

否之视为”small table”。 对于大表”large table”,SQL执行层认为存在直接路径读取的意义(direct path read)。 对于小表,将它缓存在buffer cache中的收益更大,所以直接路径读取不具有意义。_small_table_threshold 隐藏参数的值在实例启动时动态决定,一般为 2% * DB_CACHE_SIZE。

 

direct path read的优势:

 

 

1. 减少了对闩(latch)的使用,避免可能的闩争用

 

2.物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential

read,其效率往往要比单次读取这个区间的所有8个块还要低得多,Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。

 

 

当然直接路径读取也会引入一些缺点:

 

1.即便在buffer cache足够大到可以放下整个大表的情况下,direct path read无法从高速缓冲受益,每次扫描大表均需重复等量的直接路径物理读取IO

2.在直接路径读取某段前需要对该对象进行一次段级的检查点(A segment checkpoint).

3.可能导致重复的延迟块清除操作

 

 

 

 

 

 

该11g自动判断direct path read的特性适用场景:

 

1. 对大表不频繁地串行全表扫描的场景

2. Db Cache Size高速缓冲大小远小于表的大小的场景

 

不推荐在以下场景中开启该11g自动判断direct path read特性:

 

1. 从运行稳定的老版本(9i、10g)升级到11g的数据库

2. 对大表频繁地串行全表扫描的场景

 

 

SQL> select * from v$version;

 

BANNER

——————————————————————————–

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production

PL/SQL Release 11.2.0.3.0 – Production

CORE    11.2.0.3.0      Production

TNS for Linux: Version 11.2.0.3.0 – Production

NLSRTL Version 11.2.0.3.0 – Production

 

col name for a30

col value for a20

col DESCRIB for a60

set linesize 140 pagesize 1400

 

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

FROM SYS.x$ksppi x, SYS.x$ksppcv y

WHERE x.inst_id = USERENV (‘Instance’)

AND y.inst_id = USERENV (‘Instance’)

AND x.indx = y.indx

AND (x.ksppinm =’_small_table_threshold’  or x.ksppinm=’_serial_direct_read’);

 

 

NAME                           VALUE                DESCRIB

—————————— ——————– ————————————————————

_small_table_threshold         1143                 lower threshold level of table size for direct reads

_serial_direct_read            auto                 enable direct read in serial

 

 

 

其中_small_table_threshold 隐藏参数指定了 ORACLE中大表的阀值,其单位为block,即大于_small_table_threshold 所指定的块数的表被视作大表,

否之视为”small table”。 对于大表”large table”,SQL执行层认为存在直接路径读取的意义(direct path read)。 对于小表,将它缓存在buffer cache中的收益更大,所以直接路径读取不具有意义。

 

 

_small_table_threshold 隐藏参数的值在实例启动时动态决定,一般为 2% * DB_CACHE_SIZE。

 

SQL> alter system set db_cache_size=1024M scope=spfile;

 

System altered.

 

 

RESTART INSTANCE:

 

 

SQL> col name for a30

SQL> col value for a20

SQL> col DESCRIB for a60

SQL> set linesize 140 pagesize 1400

SQL>

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

2   FROM SYS.x$ksppi x, SYS.x$ksppcv y

3   WHERE x.inst_id = USERENV (‘Instance’)

4   AND y.inst_id = USERENV (‘Instance’)

5   AND x.indx = y.indx

6  AND (x.ksppinm =’_small_table_threshold’  or x.ksppinm=’_serial_direct_read’);

 

NAME                           VALUE                DESCRIB

—————————— ——————– ————————————————————

_small_table_threshold         2522                 lower threshold level of table size for direct reads

_serial_direct_read            auto                 enable direct read in serial

 

 

 

2522 block = 2522 * 8k = =  19.7 M 约等于 1024 * 2%

 

 

 

 

SQL> create table tmac (t1 char(2000)) pctfree 99 pctused 1 tablespace users;

 

Table created.

 

SQL> insert into tmac select ‘MACLEAN’ from dual connect by level <=2530;

 

2530 rows created.

 

SQL> commit;

 

Commit complete.

 

 

 

 

SQL> exec dbms_stats.gather_table_stats(‘SYS’,’TMAC’);

 

PL/SQL procedure successfully completed.

 

 

 

 

SQL> select blocks from dba_tables where table_name=’TMAC’;

 

BLOCKS

———-

2638

SQL> alter system flush buffer_cache;

 

 

SQL> select count(*) from tmac;

 

COUNT(*)

———-

2530

 

SQL> select vm.sid, vs.name, vm.value

2      from v$mystat vm, v$sysstat vs

3     where vm.statistic# = vs.statistic#

4       and vs.name in (‘cleanouts only – consistent read gets’,

5                       ‘session logical reads’,

6                       ‘physical reads’,

7                       ‘physical reads direct’);

 

SID NAME                                                                  VALUE

———- —————————————————————- ———-

135 session logical reads                                                  2859

135 physical reads                                                         2763

135 physical reads direct                                                  2576

135 cleanouts only – consistent read gets                                     0

 

 

 

physical reads direct  增加说明上面的查询使用了direct path read

 

 

SQL> alter session set “_serial_direct_read”=never;

 

Session altered.

 

SQL> select count(*) from tmac;

 

COUNT(*)

———-

2530

 

SQL> select vm.sid, vs.name, vm.value

2      from v$mystat vm, v$sysstat vs

3     where vm.statistic# = vs.statistic#

4       and vs.name in (‘cleanouts only – consistent read gets’,

5                       ‘session logical reads’,

6                       ‘physical reads’,

7                       ‘physical reads direct’);

 

SID NAME                                                                  VALUE

———- —————————————————————- ———-

135 session logical reads                                                  5497

135 physical reads                                                         5339

135 physical reads direct                                                  2576

135 cleanouts only – consistent read gets                                     0

SQL> select count(*) from tmac;

 

COUNT(*)

———-

2530

 

SQL> select vm.sid, vs.name, vm.value

2      from v$mystat vm, v$sysstat vs

3     where vm.statistic# = vs.statistic#

4       and vs.name in (‘cleanouts only – consistent read gets’,

5                       ‘session logical reads’,

6                       ‘physical reads’,

7                       ‘physical reads direct’);

 

SID NAME                                                                  VALUE

———- —————————————————————- ———-

135 session logical reads                                                  8135

135 physical reads                                                         5339

135 physical reads direct                                                  2576

135 cleanouts only – consistent read gets                                     0

 

physical reads direct 不再增加说明以上2次查询未使用direct path read

隐藏参数”_serial_direct_read” 指定了是否启用串行全表扫描下的直接路径读取(direct path read),其默认值为AUTO,设置为NEVER时禁用11g自动direct path read的特性

 

 

 

SQL> alter session set “_serial_direct_read”=auto;

 

Session altered.

 

 

还原session级别的_serial_direct_read 参数

 

SQL> delete tmac where rownum<2000;

 

1999 rows deleted.

 

SQL> commit;

 

Commit complete.

 

 

SQL> alter table tmac move tablespace users pctfree 10 pctused 90;

 

Table altered.

 

 

 

SQL>  exec dbms_stats.gather_table_stats(‘SYS’,’TMAC’);

 

PL/SQL procedure successfully completed.

 

 

SQL> select blocks from dba_tables where table_name=’TMAC’;

 

BLOCKS

———-

189

 

 

将TMAC表缩小到 _small_table_threshold以下

 

 

SQL> alter system flush buffer_cache;

 

System altered.

 

 

 

SQL> select count(*) from tmac;

 

COUNT(*)

———-

531

 

SQL> select vm.sid, vs.name, vm.value

2      from v$mystat vm, v$sysstat vs

3     where vm.statistic# = vs.statistic#

4       and vs.name in (‘cleanouts only – consistent read gets’,

5                       ‘session logical reads’,

6                       ‘physical reads’,

7                       ‘physical reads direct’);

 

SID NAME                                                                  VALUE

———- —————————————————————- ———-

135 session logical reads                                                   524

135 physical reads                                                          349

135 physical reads direct                                                     0

135 cleanouts only – consistent read gets                                     1

 

 

以上演示证明对于small table(块数小于_small_table_threshold),SQL执行层自动并不决定使用direct path read,而是将之读取到buffer cache中并逻辑读。

 

结论:

 

其中_small_table_threshold 隐藏参数指定了 ORACLE中大表的阀值,其单位为block,即大于_small_table_threshold 所指定的块数的表被视作大表,

否之视为”small table”。 对于大表”large table”,SQL执行层认为存在直接路径读取的意义(direct path read)。 对于小表,将它缓存在buffer cache中的收益更大,所以直接路径读取不具有意义。

 

 

_small_table_threshold 隐藏参数的值在实例启动时动态决定,一般为 2% * DB_CACHE_SIZE。

 

 

隐藏参数”_serial_direct_read” 指定了是否启用串行全表扫描下的直接路径读取(direct path read),其默认值为AUTO,设置为NEVER时禁用11g自动direct path read的特性

 

“As of 11.2.0.2 the legal settings are

true, false, always, auto, and never

true is the same effect as always

false is the same effect as auto

Default value is “auto”

Setting event 10949 or event 10354 may also have the side effect of making oracle behave as if _serial_direct_read = never”

 

该参数可以动态在实例或会话级别修改,而无需重启实例。

 

 

类似的10949 EVENT事件也可以起到类似的作用。

 

 

 

设置event 10949可以避免采用直接路径读取方式,该事件可以在线设置,但对现有session可能不会生效:

 

在实例级别设置:

 

ALTER SYSTEM SET EVENTS ‘10949 TRACE NAME CONTEXT FOREVER’;

 

设置到SPFILE中:

 

alter system set event=’10949 TRACE NAME CONTEXT FOREVER’ scope=spfile;

 

在session级别设置:

 

ALTER SESSION SET EVENTS ‘10949 TRACE NAME CONTEXT FOREVER’;

沪ICP备14014813号-2

沪公网安备 31010802001379号