系统规划

二. 系统部署

三.数据库管理

四.一般问题操作指南

Oracle DBA 操作手册 Handbook

系统规划

二. 系统部署

系统规划指南

1.系统检查

数据库安装

安装后验证

数据库监控管理

管理工具

日常 DBA 检查

性能故障

一般 ORA 故障处理

集群故障

报错诊断案例

系统规划指南

1.系统检查

数据库安装

Information On Installed Database Components and Schemas [ID 472937.1]

Reference List of Critical Patch Update Availability(CPU) and Patch Set Update (PSU) Documents For Oracle Database and Fusion Middleware Product [ID 783141.1]

Note 1477727.1 : Patch Set Update and Critical Patch Update October 2012 Availability Document

CSS Timeout Computation in Oracle Clusterware [ID 294430.1]

从以下文档中找到对应目标系统的软件版本：

Remote Diagnostic Agent (RDA) 4 – Getting Started (Doc ID 314422.1)

OSWatcher Black Box User Guide (Includes: [Video]) [ID 301137.1]

How to Monitor the Progress of a Materialized View Refresh (MVIEW) [ID 258021.1]

Master Note for Streams Recommended Configuration [ID 418755.1] 中的 monitor

部分

Information On Installed Database Components and Schemas [ID 472937.1]

从 11g 开始 Oracle 通过内部算法来决定串行扫描大表是通过直接路径读 direct path read，还是先读入到 buffer cache 中，此算法依据表的大小评估

性能问题原因定位以后

通过在线设置 10949 event 可以禁止对于串行全表扫描的自动 direct path read 算法，禁用该特性后对于串行扫描，11g 的表现将与 10g 中基本一致。

设置 10949 事件后 Direct Read 大幅下降：

比较法推算主机性能需求

标准测算法推算主机性能需求

两种测算方法的比较

服务器内存计算

AIX 安装前检查（以 11g Release 2 为基本）

HP UX 安装前检查（以 11g Release2 为基准）

Linux 安装前检查（以 11g Release2 为基准）

Solaris 安装前检查（以 11g Release2 为基准）

Shell Limits：

组件安装原则

数据库补丁安装

Oracle Database 11.2.0.3

Oracle Database 11.2.0.2

Oracle Database 11.1.0.7

Oracle Database 10.2.0.5

高可用性测试

应用测试

ASM 测试验证测试

数据库日志管理

等待事件的监控

buffer busy waits /read by other session

write complete waits

free buffer waits

library cache lock/library cache pin

row cache lock

db file scattered read

db file sequential read

direct path read

direct path write

db file parallel write

control file parallel write

log file sync

log buffer space

log file switch completion , log file switch (checkpoint incomplete) ,

log file switch (archiving needed) , log file switch (private stand flush incomplete)

主机监控

无效对象监控

RDA

OS Watcher

日常 DBA 检查项

使用 oradebug 追踪 SQL trace 操作指南

使用 oradebug debug SPINNING 进程

使用 oradebug 查看 Library Cache 状态

一般 Slow Performance 诊断介绍

Slow Database > Identify the Issue > Data Collection > Analyze

u Gather Database Performance Data

性能故障处理案例

ORA 故障处理原则

现场保护和收集信息

节点重启和被驱逐故障

一般集群故障

高可用性测试报错修复案例：

某个 ORA600 案例

比较法推算主机性能需求

标准测算法推算主机性能需求

两种测算方法的比较

服务器内存计算

AIX 安装前检查 （以 11g Release 2 为基本）

HP UX 安装前检查 （以 11g Release2 为基准）

Linux 安装前检查 （以 11g Release2 为基准）

Solaris 安装前检查 （以 11g Release2 为基准）

Shell Limits：

组件安装原则

AIX 安装前检查（以 11g Release 2 为基本）

HP UX 安装前检查（以 11g Release2 为基准）

Linux 安装前检查（以 11g Release2 为基准）

Solaris 安装前检查（以 11g Release2 为基准）

数据库容量计算与系统的并发用户数及每秒交易量相关，也与交易类型相关，如增、插、删、改、查询等等。例如目前资源库的数据量大约在 5TB 左右，以年增量 30% 计算，可预估需要处理的数据量在 5TB 到 10TB 之间。

根据我们的经验，可以从系统的 TPMC 的估算得出粗略的 CPU 个数及内存需求。

为了比较科学的得到某省集中运行 BOSS1.5 系统所需资源配置情况，我们制定了两种不同的方法来进行，然后将两种推算的结果进行对比，从而得出某省集中运行 BOSS1.5 系统所需资源，进而给出对资源的分配方案。

第一种方法是比较法。这种方法是根据我们得到的其他省份同类型客户运行 BOSS1.5 系统时的资源占用情况，再对比 2 个省份的业务数据，以线性方法来推算。第二种方法用标准测算 TPMC 值的方法，测算某省客户每笔业务交易所需要的 TPMC 值，从而推算出集中运行 BOSS1.5 系统所需的资源。

BOSS 系统作为客户最主要和最重要的应用系统，是其他所有应用系统的基础数据来源和应用基础。在所有的应用系统里面，本系统负荷最重，对设备需求最高，同时对时效性和安全性以及数据完整性的要求也最高，因此，本系统要求配备独立的主机并配置双机热备系统，同时为保证系统数据的安全性和完整性，需对存储设备按照 RAID0＋1 的标准进行配备。

1. 数据库主机配置的计算过程：

（1）以某同类客户为参照，现有 BOSS1.5 系统营业和帐务系统各自独立，帐务系统数据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置 32 个 CPU,72GB 内存），营业系统数据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置

32 个 CPU,112GB 内存），两套系统所使用的 CPU 共计为 128 个，内存共计为 368G。实际应用中 CPU 的使用率为 60％，内存为 90％。在正常情况下，CPU 与内存之比为 1:4 比较合理。内存按 1：4 算，需配置 128×4=512GB 内存。

（2）现在客户的用户总数约为 6000 万，对比客户用户数约为 2000 万，现有客户产生的数据量相应是对比客户的 3 倍，由此可估算现有的 CPU 数量也应是对比客户的 3 倍，即 128×3=384。内存按 1：4 算，需配置 384×4=1536GB 内存。

（3）由于对比客户使用的是 HP SUPERDOME，而当前客户使用 IBM 的 P5 系列，按照第三方的测试（详见下表：性能测试表），SUPERDOME

与 IBM P5 系列的系统处理能力约为 2：1，则在同等系统处理能力下，可以认为相对应的 IBM P5 系列的配置应为 384/2=192 个 CPU，内存为 192×4=768GB 内存。

（4）为更好的保证系统的稳定使用，CPU 的使用率应为保持在 40%左右。因此，要将 CPU 的使用率降至 40％，即降低 1.5 倍，由此可估算 CPU 的数量要增加 1.5 倍，即 192×1.5=288 个,内存按 1：4 算，需配置 288×4=1152GB 内存。

（5）近几年来，当前客户的业务每年以 20％左右的速度增长，系统的配置需要预留 2 年的发展空间，根据目前发展的态势，我们假设今后 2 年当前客户的业务还是保持 20% 的增长率，并由此估算 CPU 数量也需要以 20%的速度增长，则 2 年后需要 CPU 数量为 288×1.2×1.2= 415 ，取 4 的倍数为 416 个，内存为 416×4＝1664G。

按标准 TPMC 值测算方法，将当前客户 BOSS1.5 系统数据库主机性能需求推算如下：

1、根据对目前当前客户 BOSS 系统业务量的统计，预计 BOSS1.5 系统需要支撑的业务处理交易量将达到每天近千万笔，交易集中在每天上午和下午各 2 小时，则每分钟需要处理的交易数目为：

10000000/4/60 =41667 笔/分

2、根据当前客户业务量每年按 20％增长率来算，系统的配置需要预留 2 年的发展空间，则 2 年后每分钟需要处理的交易数目为：

41667×1.2×1.2=60000 笔/分

3、假设平均每笔业务相当于 5 个数据库事务,峰值数是平均值的 3 倍。则每分钟处理的数据库交易量为：60000x5x3=900000（户/分）。

4、每笔交易需要 9 个 TPMC 值（根据在其他省市的经验值），则主机 TPMC 值等于：

900000×9=8100000。

5、考虑到 CPU 的性能冗余，保持 40％的 CPU 利用率，则需配置性能为(Tpmc)：

8100000/40%=20250000。

6、第三方对 IBM P5 595 的满配进行的 TPMC 值测试结果如下：

7、按照 CPU 数量与 TPMC 值的线性关系，可推出 TPMC 值为 20250000 所需的 CPU

数量等于：

（64×20250000）/3,203，568=405

取 4 的倍数为 408 个

内存按 1：4 配置，内存等于：408×4=1632GB

8、不考虑未来业务发展的需要，以及 CPU 利用率保持在 60％左右较高的水平下，所推算出的 CPU 数量为 189 个，取 4 的倍数为 192 个,内存按 1：4 配置，内存等于： 192×4=768GB

综上所述，使用标准测算法推算的运行 BOSS1.5 所需主机在使用 IBM P595 时配置为：

408 个 CPU，1632GB 内存。

综合两种测算方法，我们发现，他们的结果基本是一致的。

在进行应用逻辑处理时，会涉及大量的数据库读取操作。数据库读取性能的优劣主要取决于数据库 SGA 区的大小，对于较大规模的 OLTP 应用，我们建议 SGA 区的大小至少在 4GB 以上。

除了考虑上述两个因素以外，应至少预留 30%满足操作系统本身的使用。因此在 150 个并发用户前提下，服务器的内存配置要求如下：

综合以上数据内存需求大约 16GB.

本部分主要通过系统部署前的系统检查，安装要点，建议的安装的补丁和安装完成后的系统测试来阐述系统部署中的注意点。

官方联机文档中有详细安装需求介绍可以参考

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：

软件需求版本：

AIX 5L

AIX 6.1

AIX 7.1

C/C++ 安装需求：

操作系统补丁需求：如果操作系统已经包含了更新的补丁，可以不用安装以下补丁，以下补丁为最低需求。 AIX 5L

AIX 6.1

AIX 7.1

操作系统内核参数设置：

以下参数为 Oracle 推荐值，某些参数如 udp_recvspace 等可以酌情调整。

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：以下操作系统版本为最低需求。

操作系统补丁版本：以下补丁为最低补丁需求，如果现有补丁已经包含了最低需求可以不安装

系统内核参数设置：使用以下命令修改参数为 Oracle 推荐值

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：以下系统为最低操作系统版本需求：

操作系统软件需求：

系统内核参数设置：

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：

软件包需求：

操作系统补丁：

内核参数配置：

Oracle 数据库包含了大量组件，一般使用 DBCA 安装时会默认安装上，建议用户自定义安装需要的组件，不必要的安装将会加大出现故障的概率

以下为各个组件的介绍，可以根据应用需求安装。

详细文档可以参考

数据库补丁安装建议参考以下文档（MOS 文档）此文档每隔季度更新，CPU 和 PSU

补丁会在文档中发布链接

（由于 10g 数据库已经接近延长支持时限，建议安装 11gR2 版本数据库）

此文档列出了最新的 CPU 和 PSU 补丁，建议安装数据库时根据文档要求安装最新的

CPU 或者 PSU 补丁（一般 PSU 会涵盖 CPU 补丁）例如：

高可用性测试可以非常有效的查看出系统故障时候整体的表现，从而让管理员了解整个系统在何种情况下需要手动干预或者自动让 HA 相关组件自动起作用。

高可用性测试也可以让管理员了解整个系统的薄弱环节和高可用性是否已经满足了需求

尤其针对 RAC 环境，安装完成后一定要完成整体高可用性测试

以下是针对 11gR2 的 RAC 集群的高可用性测试的测试项目，10gRAC 需要酌情改变测试项目

具体测试项如下：

测试完成需要针对测试中出现的问题逐一排查，以消除隐患。需要特别注意：

伴随着高可用性测试中 RAC 节点故障模拟测试，可以同时进行应用测试。

应用测试可以使管理员了解整个应用连接和在故障中应用连接数据库的表现（如会否连接断开，是否会需要重新登录，）

应用测试还可以同时让开发人员了解到如何配置连接可以使得连接始终保持高可用，以满足软甲开发需求。

ASM 验证测试可以测试 ASM 各个功能组件是否工作正常。

ASM 功能性测试

ASMCMD 功能性测试：

数据库日志一般需要数据库告警日志: alert_<SID>.log background_dump directory core_dump directory user_dump directory CRS_HOME log directory

一般根据 oracle 工程师经验可以参考如下管理方式：

使用脚本或者监控工具监控 alert log 中的 “ORA-” “ERROR” 字符。每隔一定时间备份（move） alert log。每次出现错误时也自动备份（move） alert log。

避免出现 alert log 丢失或者过大无法开打的问题。

等待事件监控必须包含了 enq 和 latch 类等待事件，同时也必须包含以下可能出现的主要等待事件。

主机监控需要监控整个数据库服务器内存使用率 CPU 负荷 RAC 环境下的内外网络连接信息。一般内网链接如果中断或者故障可以从 crs 和 ocssd log 中监控到以下文档介绍了 CSS timeout

由于数据库无故增加无效对象即可视为数据库有潜在问题，需要谨慎对待，尤其是

sys 或者 system 的对象无效

可以用如下 sql 语句查询无效对象，可以定义相关 ignore 的对象以方便过滤。

一般开发人员会在数据库中遗留有部分废弃对象，建议用户增加个小表记录这些废弃对象，从而可以在监控脚本中增加过滤。

RDA (Remote Diagnostic Agent) 是一款命令行式的诊断收集工具，可以方便 Oracle

工程师直接收集到所需的整体系统信息。

使用方法如下

1.复制 RDA 软件包至目标服务器

2. unzip rda.zip

3. ./<rda> -cv

使用-cv 命令参数验证整个软件包完整性

4. ./<rda> -S

使用-S 参数配置整个 RDA。配置方式为交互式，逐一进行配置

5. ./rda.sh

使用 RDA 工具收集整个系统诊断信息。

收集完成后可以从 <RDA DIRECTORY>/output/RDA start.htm 查看整个 RDA 报告

在<RDA DIRECTORY>/output/ 目录中有打包完成的报告，可以用来上传入 SR。以下为部分 RDA 截图

OS Wather 是一组由 oracle 开发的脚本，可以收集整体数据库服务器 os 级别的

vmstat iostat 网络状况等信息。可以自己定义各种收集参数。可以根据以下文档进行安装

文档中包含了视频教程。

日常 DBA 需要包含以下日常检查项。

1. 针对 Oracle Alert 和 CRS 相关日志中的错误及时处理

2. 检查 tablespace 空间增长量和趋势。及时按照一定规范增加表空间。

tablespace 文件应该考虑遵循以下原则：条带化文件分布

减少自动扩展的应用文件大小一般不超过 20G 为宜如果使用卷管理，每个卷不宜放满。

3. 每间隔一定时间备份日志需要备份的日志：数据库告警日志: alert_<SID>.log

background_dump directory core_dump directory user_dump directory CRS_HOME log directory

4.检查备份完成和完整性

5. 其余应用的完好性

例如使用了 Mview 或者 AQ 等其他工作同步数据，需要验证数据完整性。

以下文档可以作为参考

本指南目的为了帮助 DBA 可以在一般故障中快速收集数据，定位问题，为后续的查找问题根源提供帮助。但是肯定有部分遗漏，需要客户在工作中补足。并且本步骤也并非唯一可用手段

一般而言，数据库 hang 住，sys 用户还是可以登录，此时可以使用如下 oradebug 工具抓取 hang 住时候的数据库信息进行后台分析。

操作方法如下：

完成后会在 dump 目录中生成 trace 文件，通过 trace 文件分析可以得到数据库 hang

的原因。

使用 oradebug 可以更灵活的设置追踪的 level 等级，使用也比较方便。以下为操作示例：

1. 从 v$session 找到对应 session 的 sid 和 serial#

2.通过 ADDR 找到对应的 SPID

3. 使用如下命令产生 10046 level 4 级 trace

4.完成后关闭 trace

以下是各个等级的 10046 事件的含义：

一些处于 SPINNING 状态进程会消耗大量的 CPU 和内存资源。此时如果可以同时抓取进程的 errorstack 可以极大的方便 oracle support 定位问题操作步骤如下：

可以从 v$process 找到对应的 SPID.

可以使用 oradebug 查看到详细的 library cache 状态，从而立即可以定位是否 library cache 造成的问题。

从 trace 文件输出如下

如果是某些 session 较慢还需要收集对这些 session 生成 trace

如何生成 trace 可以参考之前 oradebug 追踪 session 部分或者使用 alter session set events ….做操作

客户某个应用出现阻塞。首先通过收集相关数据后查看对应时间内的 AWR 报告。系统中存在大量的 direct path read 等待事件

且在一个小时内 Direct reads 消耗了 3.4T 的 IO，平均每秒读取 1GB 数据，这可能是导致系统 IO 响应时间大幅下降的主要原因：

从 statistics 角度分析，由 LOB 大对象引起的 direct path read 每秒不到 10 次，这说明大量的 direct read path 并非由于 LOB 引起。

从 segment 角度分析，主要引起 direct path read 的数据段是

PBMS_SYNEST_INF_REC1 分区表。

至此此问题得到初步解决。

此部分主要介绍了一般 ORA 故障处理注意要点。建议用户平时处理时注意积累处理经验。

ORA 故障一般原则遵循：保护现场，收集信息，谋定后动，不贸然操作数据库或者

RAC 集群。

现场保护主要以保存相关 log 记录，和生成 trace 文件为主，目的是为了保护故障环境，方便支持人员定位故障，处理故障。

需要保存的日志如下：数据库告警日志: alert_<SID>.log background_dump directory core_dump directory user_dump directory CRS_HOME log directory

# dump trace 文件可以根据 alert log 记录保存对应的文件。

如果是因为数据库性能缓慢导致 hang 住，建议不要贸然重启数据库，首先使用 hang analyze 生成对应的 trace 文件。

使用方法在 oradebug 中有介绍

如果数据库已经 crash。建议不要贸然启动数据库，查看日志后确认后再尝试启动。故障发生后保存对应时间内的 OS Watcher 信息。

RAC 集群故障一般可以分为不同类别，需要收集的数据也不同。建议处理步骤为

收集数据->提交 SR->SR 分析->获得建议或者结果

节点重启和被驱逐故障数据收集：

一般集群故障可以使用如下脚本收集：

11gR2 以 root 运行

10gR2 以 root 运行：

如果使用了 ASM ，请打包 ASM 日志一并上传入 SR

报错诊断提供了 2 个案例，分别在测试和后期维护中发生，可以发现按照之前的收集日志要求，可以查询到相关问题。并且利于后期的诊断修复。

某客户做高可用性测试时候发生故障过程如下：

在第三节点测试，当时状态如下：

客户 kill 掉了 ocssd 进程

可以发现第三节点的 VIP 已经迁移到一节点中，连接并不受影响，第三节点已经开始重启，所有 nodeapps offline

第三节点服务器重启后 crs 日志中记载 VIP 和 nodeapps 启动成功，但是数据库启动失败，

经过查询日志很快定位了问题，因为手动启动实例，实例正常。

从日志中发现是由于系统自动重启时实例先于 ASM 启动，ASM 没有启动完成时实例无法启动。

定位问题之后，就很容易给出解决办法

此问题需要更改实例中的 REQUIRED_RESOURCES 配置，修改后重新测试 OCSSD 进程 crash ，服务器重启后数据库可以随 ASM 启动后启动，无须手工干预。增加配置方式如下：

srvctl modify instance -d gwmngdb -i gwmngdb1 -s +ASM1 srvctl modify instance -d gwmngdb -i gwmngdb2 -s +ASM2 srvctl modify instance -d gwmngdb -i gwmngdb3 -s +ASM3

更改后配置如下: 此时节点重启无须手工干预即可自动启动实例，达到了测试目的。

用户 2 节点 RAC 数据库某个节点实例重启。通过收集相关日志和 trace 文件和 hang anylyze 后发现。首先报如下错误

随后报如下错误：

并且有大量如下报错：

经过分析日志和相关的 dump 文件发现，当没有 ora 600 错误出现后，此时数据库已经 hang 住，因为大量的 row cache ENQUEUE 导致了数据库被 hang，此时 MMON 进程无法正常重启。并且发现此时多个数据字典已经被 lock 住，所以导致了节点 1 无法正常启动，（节点 2 停了以后节点 1 才能正常启动）。

如下部分 systemdump 的解释，也说明了大量的进程正在等待大量的 row cache enqueue..

数据库容量计算与系统的并发用户数及每秒交易量相关，也与交易类型相关，如 增、插、删、改、查询等等。例如目前资源库的数据量大约在 5TB 左右，以年增量 30% 计算， 可预估需要处理的数据量在 5TB 到 10TB 之间。

根据我们的经验，可以从系统的 TPMC 的估算得出粗略的 CPU 个数及内存需求。

为了比较科学的得到某省集中运行 BOSS1.5 系统所需资源配置情况，我们制定了两种 不同的方法来进行，然后将两种推算的结果进行对比，从而得出某省集中运行 BOSS1.5 系统所需资源，进而给出对资源的分配方案。

1. 数据库主机配置的计算过程：

（1）以某同类客户为参照，现有 BOSS1.5 系统营业和帐务系统各自独立，帐务系统数 据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置 32 个 CPU,72GB 内 存），营业系统数据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置

32 个 CPU,112GB 内存），两套系统所使用的 CPU 共计为 128 个，内存共计为 368G。 实际应用中 CPU 的使用率为 60％，内存为 90％。在正常情况下，CPU 与内存之比为 1:4 比较合理。内存按 1：4 算，需配置 128×4=512GB 内存。

（2） 现在客户的用户总数约为 6000 万，对比客户用户数约为 2000 万，现有客户产生 的数据量相应是对比客户的 3 倍，由此可估算现有的 CPU 数量也应是对比客户的 3 倍， 即 128×3=384。内存按 1：4 算，需配置 384×4=1536GB 内存。

（3） 由于对比客户使用的是 HP SUPERDOME，而当前客户使用 IBM 的 P5 系列， 按照第三方的测试（详见下表：性能测试表），SUPERDOME

与 IBM P5 系列的系统处理能力约为 2：1，则在同等系统处理能力下，可以认为相对 应的 IBM P5 系列的配置应为 384/2=192 个 CPU，内存为 192×4=768GB 内存。

（4）为更好的保证系统的稳定使用，CPU 的使用率应为保持在 40%左右。因此，要 将 CPU 的使用率降至 40％，即降低 1.5 倍，由此可估算 CPU 的数量要增加 1.5 倍， 即 192×1.5=288 个,内存按 1：4 算，需配置 288×4=1152GB 内存。

按标准 TPMC 值测算方法，将当前客户 BOSS1.5 系统数据库主机性能需求推算如下：

1、根据对目前当前客户 BOSS 系统业务量的统计，预计 BOSS1.5 系统需要支撑的业务 处理交易量将达到每天近千万笔，交易集中在每天上午和下午各 2 小时，则每分钟需 要处理的交易数目为：

10000000/4/60 =41667 笔/分

2、根据当前客户业务量每年按 20％增长率来算，系统的配置需要预留 2 年的发展空间， 则 2 年后每分钟需要处理的交易数目为：

41667×1.2×1.2=60000 笔/分

3、假设平均每笔业务相当于 5 个数据库事务,峰值数是平均值的 3 倍。则每分钟处理的 数据库交易量为：60000x5x3=900000（户/分）。

4、每笔交易需要 9 个 TPMC 值（根据在其他省市的经验值），则主机 TPMC 值等于：

900000×9=8100000。

5、考虑到 CPU 的性能冗余，保持 40％的 CPU 利用率，则需配置性能为(Tpmc)：

8100000/40%=20250000。

6、第三方对 IBM P5 595 的满配进行的 TPMC 值测试结果如下：

7、按照 CPU 数量与 TPMC 值的线性关系，可推出 TPMC 值为 20250000 所需的 CPU

数量等于：

（64×20250000）/3,203，568=405

取 4 的倍数为 408 个

内存按 1：4 配置，内存等于：408×4=1632GB

8、不考虑未来业务发展的需要，以及 CPU 利用率保持在 60％左右较高的水平下，所 推算出的 CPU 数量为 189 个，取 4 的倍数为 192 个,内存按 1：4 配置，内存等于： 192×4=768GB

综上所述，使用标准测算法推算的运行 BOSS1.5 所需主机在使用 IBM P595 时配置为：

408 个 CPU，1632GB 内存。

综合两种测算方法，我们发现，他们的结果基本是一致的。

在进行应用逻辑处理时，会涉及大量的数据库读取操作。数据库读取性能的优劣主要 取决于数据库 SGA 区的大小，对于较大规模的 OLTP 应用，我们建议 SGA 区的大小 至少在 4GB 以上。

除了考虑上述两个因素以外，应至少预留 30%满足操作系统本身的使用。 因此在 150 个并发用户前提下，服务器的内存配置要求如下：

综合以上数据内存需求大约 16GB.

本部分主要通过系统部署前的系统检查，安装要点，建议的安装的补丁和安装完成后 的系统测试来阐述系统部署中的注意点。

官方联机文档中有详细安装需求介绍可以参考

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复 安装等。

操作系统版本：

软件需求版本：

AIX 5L

AIX 6.1

AIX 7.1

C/C++ 安装需求：

操作系统补丁需求： 如果操作系统已经包含了更新的补丁，可以不用安装以下补丁，以下补丁为最低需求。 AIX 5L

AIX 6.1

AIX 7.1

操作系统内核参数设置：

以下参数为 Oracle 推荐值，某些参数如 udp_recvspace 等可以酌情调整。

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复 安装等。

操作系统版本： 以下操作系统版本为最低需求。

操作系统补丁版本： 以下补丁为最低补丁需求，如果现有补丁已经包含了最低需求可以不安装

系统内核参数设置： 使用以下命令修改参数为 Oracle 推荐值

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复 安装等。

操作系统版本： 以下系统为最低操作系统版本需求：

操作系统软件需求：

系统内核参数设置：

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复 安装等。

操作系统版本：

软件包需求：

操作系统补丁：

内核参数配置：

Oracle 数据库包含了大量组件，一般使用 DBCA 安装时会默认安装上，建议用户自定 义安装需要的组件，不必要的安装将会加大出现故障的概率

以下为各个组件的介绍，可以根据应用需求安装。

详细文档可以参考

数据库容量计算与系统的并发用户数及每秒交易量相关，也与交易类型相关，如增、插、删、改、查询等等。例如目前资源库的数据量大约在 5TB 左右，以年增量 30% 计算，可预估需要处理的数据量在 5TB 到 10TB 之间。

为了比较科学的得到某省集中运行 BOSS1.5 系统所需资源配置情况，我们制定了两种不同的方法来进行，然后将两种推算的结果进行对比，从而得出某省集中运行 BOSS1.5 系统所需资源，进而给出对资源的分配方案。

（1）以某同类客户为参照，现有 BOSS1.5 系统营业和帐务系统各自独立，帐务系统数据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置 32 个 CPU,72GB 内存），营业系统数据库主机的配置为使用 HP SUPERDOME 的 2 节点 RAC（各配置

32 个 CPU,112GB 内存），两套系统所使用的 CPU 共计为 128 个，内存共计为 368G。实际应用中 CPU 的使用率为 60％，内存为 90％。在正常情况下，CPU 与内存之比为 1:4 比较合理。内存按 1：4 算，需配置 128×4=512GB 内存。

（2）现在客户的用户总数约为 6000 万，对比客户用户数约为 2000 万，现有客户产生的数据量相应是对比客户的 3 倍，由此可估算现有的 CPU 数量也应是对比客户的 3 倍，即 128×3=384。内存按 1：4 算，需配置 384×4=1536GB 内存。

（3）由于对比客户使用的是 HP SUPERDOME，而当前客户使用 IBM 的 P5 系列，按照第三方的测试（详见下表：性能测试表），SUPERDOME

与 IBM P5 系列的系统处理能力约为 2：1，则在同等系统处理能力下，可以认为相对应的 IBM P5 系列的配置应为 384/2=192 个 CPU，内存为 192×4=768GB 内存。

（4）为更好的保证系统的稳定使用，CPU 的使用率应为保持在 40%左右。因此，要将 CPU 的使用率降至 40％，即降低 1.5 倍，由此可估算 CPU 的数量要增加 1.5 倍，即 192×1.5=288 个,内存按 1：4 算，需配置 288×4=1152GB 内存。

1、根据对目前当前客户 BOSS 系统业务量的统计，预计 BOSS1.5 系统需要支撑的业务处理交易量将达到每天近千万笔，交易集中在每天上午和下午各 2 小时，则每分钟需要处理的交易数目为：

2、根据当前客户业务量每年按 20％增长率来算，系统的配置需要预留 2 年的发展空间，则 2 年后每分钟需要处理的交易数目为：

3、假设平均每笔业务相当于 5 个数据库事务,峰值数是平均值的 3 倍。则每分钟处理的数据库交易量为：60000x5x3=900000（户/分）。

8、不考虑未来业务发展的需要，以及 CPU 利用率保持在 60％左右较高的水平下，所推算出的 CPU 数量为 189 个，取 4 的倍数为 192 个,内存按 1：4 配置，内存等于： 192×4=768GB

在进行应用逻辑处理时，会涉及大量的数据库读取操作。数据库读取性能的优劣主要取决于数据库 SGA 区的大小，对于较大规模的 OLTP 应用，我们建议 SGA 区的大小至少在 4GB 以上。

除了考虑上述两个因素以外，应至少预留 30%满足操作系统本身的使用。因此在 150 个并发用户前提下，服务器的内存配置要求如下：

本部分主要通过系统部署前的系统检查，安装要点，建议的安装的补丁和安装完成后的系统测试来阐述系统部署中的注意点。

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统补丁需求：如果操作系统已经包含了更新的补丁，可以不用安装以下补丁，以下补丁为最低需求。 AIX 5L

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：以下操作系统版本为最低需求。

操作系统补丁版本：以下补丁为最低补丁需求，如果现有补丁已经包含了最低需求可以不安装

系统内核参数设置：使用以下命令修改参数为 Oracle 推荐值

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

操作系统版本：以下系统为最低操作系统版本需求：

系统安装需要满足 11g Release 2 版本需求以利于后期升级维护操作，避免系统的重复安装等。

Oracle 数据库包含了大量组件，一般使用 DBCA 安装时会默认安装上，建议用户自定义安装需要的组件，不必要的安装将会加大出现故障的概率

CPU 或者 PSU 补丁（一般 PSU 会涵盖 CPU 补丁）例如：

高可用性测试可以非常有效的查看出系统故障时候整体的表现，从而让管理员了解整个系统在何种情况下需要手动干预或者自动让 HA 相关组件自动起作用。

高可用性测试也可以让管理员了解整个系统的薄弱环节和高可用性是否已经满足了需求

以下是针对 11gR2 的 RAC 集群的高可用性测试的测试项目，10gRAC 需要酌情改变测试项目

测试完成需要针对测试中出现的问题逐一排查，以消除隐患。需要特别注意：

伴随着高可用性测试中 RAC 节点故障模拟测试，可以同时进行应用测试。

应用测试可以使管理员了解整个应用连接和在故障中应用连接数据库的表现（如会否连接断开，是否会需要重新登录，）

应用测试还可以同时让开发人员了解到如何配置连接可以使得连接始终保持高可用，以满足软甲开发需求。

数据库日志一般需要数据库告警日志: alert_<SID>.log background_dump directory core_dump directory user_dump directory CRS_HOME log directory

使用脚本或者监控工具监控 alert log 中的 “ORA-” “ERROR” 字符。每隔一定时间备份（move） alert log。每次出现错误时也自动备份（move） alert log。

等待事件监控必须包含了 enq 和 latch 类等待事件，同时也必须包含以下可能出现的主要等待事件。

主机监控需要监控整个数据库服务器内存使用率 CPU 负荷 RAC 环境下的内外网络连接信息。一般内网链接如果中断或者故障可以从 crs 和 ocssd log 中监控到以下文档介绍了 CSS timeout

本文永久地址：https://www.askmac.cn/archives/oracle-dba-%E6%93%8D%E4%BD%9C%E6%89%8B%E5%86%8C-handbook.html

Oracle DBA 操作手册 Handbook

l 数据库的 SGA 区

数据库规模	SGA 区的下限
1TB	2GB
3TB	3GB
5TB	3GB
10TB	4GB

l 操作系统预留

数据库规模

内存容量

1TB	6GB
3TB	8GB
5TB	8GB
10TB	10GB

# /usr/bin/ndd /dev/tcp tcp_smallest_anon_port tcp_largest_anon_port 49152

65535

TRANSPORT_NAME[0]=tcpNDD_NAME[0]=tcp_largest_anon_port NDD_VALUE[0]=65500

TRANSPORT_NAME[1]=tcp

NDD_NAME[1]=tcp_smallest_anon_port NDD_VALUE[1]=9000

TRANSPORT_NAME[0]=udp

NDD_NAME[0]=udp_largest_anon_port NDD_VALUE[0]=65500

Parameter	Replaced by Resource Control	Minimum Value
noexec_user_stack	NA (can be set in/etc/system only)	1
semsys:seminfo_semmni	project.max-sem-ids	100
semsys:seminfo_semmsl	process.max-sem- nsems	256
shmsys:shminfo_shmmax	project.max-shm- memory	4294967295
shmsys:shminfo_shmmni	project.max-shm-ids	100

Shell Limit	Recommended Value
TIME	-1 (Unlimited)
FILE	-1 (Unlimited)
DATA	Minimum value: 1048576
STACK	Minimum value: 32768
NOFILES	Minimum value: 4096
VMEMORY	Minimum value: 4194304

Product Home

Patch

Oracle Database home

Database 11.2.0.3 SPU Patch 14390252, or Database 11.2.0.3.4 PSU Patch 14275605, or GI 11.2.0.3.4 PSU Patch 14275572, orQuarterly Database patch for Exadata – October 2012 11.2.0.3.11 BP Patch 14474780, or

Quarterly Full Stack download for Exadata (October 2012) BP Patch 14621036, or


	Product Home	Patch
		Microsoft Windows (32-Bit) BP 11 Patch 14613222, or Microsoft Windows x64 (64-Bit) BP 11 Patch 14613223
	Oracle Database home	CPU Patch 13705478

Product Home

Patch

Oracle Database home

Database 11.2.0.2 SPU Patch 14390377, or Database 11.2.0.2.8 PSU Patch 14275621, or GI 11.2.0.2.8 PSU Patch 14390437, orExadata Database Recommended BP 18 Patch 14461970, or Microsoft Windows (32-Bit) BP 22 Patch 14672267, or

Microsoft Windows x64 (64-Bit) BP 22 Patch 14672268

Oracle Database home

CPU Patch 13705478

Product Home	Patch
Oracle Database home	Database 11.1.0.7 SPU Patch 14390384, or Database 11.1.0.7.13 PSU Patch 14275623, orMicrosoft Windows (32-Bit) BP 50 Patch 14672312, or Microsoft Windows x64 (64-Bit) BP 50 Patch 14672313
Oracle Database home	CPU Patch 13705478
Oracle CRS home	CRS 11.1.0.7.7 PSU Patch 11724953
Oracle Database home	CPU Patch 9288120
Oracle Database home	CPU Patch 10073948
Oracle Database home	CPU Patch 11738232

Product Home	Patch
Oracle Database home	Database 10.2.0.5 SPU Patch 14390396, or Database 10.2.0.5.9 PSU Patch 14275629, orMicrosoft Windows (32-Bit) BP 19 Patch 14553356, or Microsoft Windows x64 (64-Bit) BP 19 Patch 14553358, or Microsoft Windows Itanium (64-Bit) BP 19Patch 14553357
Oracle Database home	CPU Patch 13705478
Oracle Database home	CPU Patch 12536181
Oracle Warehouse Builder home	CPU Patch 11738172
Oracle CRS home	CRS 10.2.0.5.2 PSU Patch 9952245

Tes t #	Test	Procedure	Expected Results	Measure s
Test 1	CRSDProcess Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CRSD process: # ps –ef \| grep crsd Kill the CRSD process: # kill –9 <crsd pid> • For Windows: Use Process Explorer to identify the crsd.exe process. Once the crsd.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• CRSD process failure is detected by the orarootagent and CRSD is restarted. Review the following logs: o $GI_HOME/log/<nodename>/crsd/crsd.log o $GI_HOME/log/<nodename>/agent/ohasd/ora rootagent_root/orarootagent_root.log	• Time to restart CRSD process
Test 2	EVMDProcess Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the EVMD process:# ps –ef \| grep evmd Kill the EVMD process: # kill –9 <evmd pid> • For Windows: Use Process Explorer to identify the evmd.exe process. Once the evmd.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• EVMD process failure is detected by the OHASD orarootagent and CRSD is restarted. Review the following logs: o $GI_HOME/log/<nodename>/evmd/evmd.log o $GI_HOME/log/<nodename>/agent/ohasd/ora agent_grid /oraagent_grid.log	• Time to restart the EVMD process
Test 3	CSSDProcess Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CSSD process: # ps –ef \| grep cssd Kill the CSSD process: # kill –9 <cssd pid> • For Windows: Use Process Explorer to identify the ocssd.exe process. Once the ocssd.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• The node will reboot. • Cluster reconfiguration will take place • Windows ONLY: On the system console a Blue Screen will show with a stop code of 0x0000ffff which indicates that the OraFence driver rebooted the box due to a CSSD failure.	• Time for the eviction and cluster reconfig uration on the surviving nodes • Time for the node to come back online and


				reconfig uration to complete to add the node as an active member of the cluster.
Test 4	CRSD ORAAGENT RDBMSProcess Failure NOTE: Test Valid for Only Multi User Installations.	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CRSD oraagent for the RDBMS software owner: # cat $GI_HOME/log/<nodename>/agent/crsd /oraagent_<rdbms_owner>/oraagent_<rd bms_owner>.pid # kill –9 <pid for RDBMS oraagent process>	• The ORAAGENT process failure is detected by CRSD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/crsd/crsd.log o $GI_HOME/log/<nodename>/agent/crsd/oraa gent_<rdbms_owner>/oraagent_<rdbms_own er>.log	• Time to restart the ORAAG ENT process
Test 5	CRSD ORAAGENT Grid Infrastruc ture Process Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CRSD oraagent for the GI software owner: # cat $GI_HOME/log/<nodename>/agent/crsd /oraagent_<GI_owner>/oraagent_<GI_o wner>.pid # kill –9 <pid for GI oraagent process> • For Windows: Use Process Explorer to identify the crsd oraagent.exe process that is a child process of crsd.exe (or obtain the pid for the crsd oraagent.exe as shown in the Unix/Linux instructions above). Once the proper oraagent.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• The Grid Infrastructure ORAAGENT process failure is detected by CRSD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/crsd/crsd.log o $GI_HOME/log/<nodename>/agent/crsd/oraa gent_<GI_owner>/oraagent_<GI_owner>.log	• Time to restart the ORAAG ENT process
Test 6	CRSD ORARO OTAGENT Process Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CRSD orarootagent: # cat $GI_HOME/log/<nodename>/agent/crsd /orarootagent_root/orarootagent_root.pid ” # kill –9 <pid for orarootagent process> • For Windows: Use Process Explorer to identify the crsd orarootagent.exe process that is a child process of crsd.exe (or obtain the pid for the crsd orarootagent.exe as shown in the Unix/Linux instructions above). Once the proper orarootagent.exe process is	• The ORAROOTAGENT process failure is detected by CRSD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/crsd/crsd.log o $GI_HOME/log/<nodename>/agent/crsd/orar ootagent_root/orarootagent_root.log	• Time to restart the ORARO OTAGE NT process


		identified kill the process by right clicking the executable and choosing “Kill Process”.
Test 7	OHASD ORAAGE NTProcess Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the OHASD oraagent: # cat $GI_HOME/log/<nodename>/agent/ohas d/oraagent_<GI_owner>/oraagent_<GI_ owner>.pid # kill –9 <pid for oraagent process> • For Windows: Use Process Explorer to identify the ohasd oraagent.exe process that is a child process of ohasd.exe (or obtain the pid for the ohasd oraagent.exe as shown in the Unix/Linux instructions above). Once the proper oraagent.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• The ORAAGENT process failure is detected by OHASD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/ohasd/ohasd.lo g o $GI_HOME/log/<nodename>/agent/ohasd/ora agent_<GI_owner>/oraagent_<GI_owner>.lo g	• Time to restart the ORAAG ENT process
Test 8	OHASD ORARO OTAGENT Process Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the OHASD orarootagent: # cat $GI_HOME/log/<nodename>/agent/ohas d/orarootagent_root/orarootagent_root.pi d # kill –9 <pid for orarootagent process> • For Windows: Use Process Explorer to identify the ohasd orarootagent.exe process that is a child process of ohasd.exe (or obtain the pid for the ohasd orarootagent.exe as shown in the Unix/Linux instructions above). Once the proper orarootagent.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.	• The ORAROOTAGENT process failure is detected by OHASD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/ohasd/ohasd.lo g o $GI_HOME/log/<nodename>/agent/ohasd/ora rootagent_root/orarootagent_root.log	• Time to restart the ORARO OTAGE NT process
Test 9	CSSDAG ENTProcess Failure	• For AIX, HPUX, Linux and Solaris: Obtain the PID for the CSSDAGENT: # ps –ef \| grep cssdagent # kill –9 <pid for cssdagent process> • For Windows: Use Process Explorer to identify the cssdagent.exe process. Once the cssdagent.exe process is identified kill	• The CSSDAGENT process failure is detected by OHASD and is automatically restarted. Review the following logs: o $GI_HOME/log/<nodename>/ohasd/ohasd.lo g o $GI_HOME/log/<nodename>/agent/ohasd/ora cssdagent_root/oracssdagent_root.log	• Time to restart the CSSDA GENT process

the process by right clicking the executable and choosing “Kill Process”.

Test 10

CSSMON ITORProcess Failure

• For AIX, HPUX, Linux and Solaris:

Obtain the PID for the CSSDMONITOR:

# ps –ef | grep cssdmonitor

# kill –9 <pid for cssdmonitor process>

• For Windows:

Use Process Explorer to identify the cssdmonitor.exe process. Once the cssdmonitor.exe process is identified kill the process by right clicking the executable and choosing “Kill Process”.

• The CSSDMONITOR process failure is detected by OHASD and is automatically restarted. Review the following logs:

$GI_HOME/log/<nodename>/ohasd/ohasd.lo g

$GI_HOME/log/<nodename>/agent/ohasd/ora cssdmonitor_root/oracssdmonitor_root.log

• Time to restart the CSSMO NITOR process

Test #	Test	Procedure	Expected Results/Measures
Test 1	Verify that candidate disks are available.	• Add a Disk/LUN to the RAC nodes and configure the Disk/LUN for use by ASM.• Login to ASM via SQL*Plus and run: “select name, group_number, path, state, header_status, mode_status, label from v$asm_disk”	• The newly added LUN will appear as a candidate disk within ASM.
Test 2	*Create an external redundancy ASM diskgroup using SQLPlus**	• Login to ASM via SQL*Plus and run: “create diskgroup <dg name> external redundancy disk ‘<candidate path>’ ;“	• A successfully created diskgroup. This diskgroup should also be listed in v$asm_diskgroup.• The diskgroup will be registered as a Clusterware resource (crsctl stat res –t)
Test 3	*Create an normal or high redundancy ASM diskgroup using SQLPlus**	• Login to ASM via SQL*Plus and run: “create diskgroup <dg name> norma lredundancy disk ‘<candidate1 path>, ‘<candidate 2 path> ;”	• A successfully created diskgroup with normal redundancy and two failure groups. For high redundancy, it will create three fail groups.• The diskgroup will be registered as a Clulsterware resource (crsctl stat res –t)
Test 4	*Add a disk to a ASM disk group using SQLPlus**	• Login to ASM via SQLPlus and run: “alter diskgroup <dg name> add disk ‘<candidate1 path> ;” NOTE:* Progress can be monitored by querying v$asm_operation	• The disk will be added to the diskgroup and the data will be rebalanced evenly across all disks in the diskgroup.
Test 5	*Drop an ASM disk from a diskgroup using SQLPlus**	• Login to ASM via SQLPlus and run: “alter diskgroup <dg name> drop disk <disk name>;” NOTE:* Progress can be monitored by querying v$asm_operation	• The data from the removed disk will be rebalanced across the remaining disks in the diskgroup. Once the rebalance is complete the disk will have a header_status of “FORMER” (v$asm_disk) and will be a candidate to be added to another diskgroup.
Test 6	*Undrop a ASM disk that is currently being dropped using SQLPlus**	• Login to ASM via SQLPlus and run: “alter diskgroup <dg name> drop disk <disk name>;” • Before the rebalance completes run the following command via SQLPlus: “alter diskgroup <dg name> undrop disk <disk name>;” NOTE: Progress can be monitored by querying v$asm_operation	• The undrop operation will rollback the drop operation (assuming it has not completed). The disk entry will remain in v$asm_disk as a MEMBER.
Test 7	*Drop a ASM diskgroup using SQLPlus**	• Login to ASM via SQL*Plus and run: “drop diskgroup <dg name>;”	• The diskgroup will be successfully dropped.• The diskgroup will be unregistered as a Clusterware resource (crsctl stat res –t)


Test 8	*Modify rebalance power of an active operation using SQLPlus**	• Login to ASM via SQLPlus andrun: “alter diskgroup <dg name> add disk ‘<candidate1 path> ;” • Before the rebalance completes run the following command via SQLPlus: “alter diskgroup <dg name> rebalance power <1 – 11>;”. 1 is the default rebalance power. NOTE: Progress can be monitored by querying v$asm_operation	• The rebalance power of the currentoperation will be increased to the specified value. This is visible in the v$asm_operation view.
Test 9	Verify CSS- database communication and ASM files access.	• Start all the database instances andquery the v$asm_client view in the ASM instances.	• Each database instance should be listedin the v$asm_client view.
Test 10	*Check the internal consistency of disk group metadata using SQLPlus**	• Login to ASM via SQL*Plus and run: “alter diskgroup <name> check all”	• If there are no internal inconsistencies,the statement “Diskgroup altered” will be returned (asmcmd will return back to the asmcmd prompt). If inconsistencies are discovered, then appropriate

Test#

Test

Procedure

Expected Results/Measures

Test 1

Verify that candidate disks are available.

• Add a Disk/LUN to the RAC nodes and configure the Disk/LUN for use by ASM.• Login to ASM via ASMCMD and run:

“lsdsk –candidate

• The newly added LUN will appear as a candidate disk within ASM.

Test 2

Create an external redundancy ASM diskgroup using ASMCMD

• Identify the candidate disks for the diskgroup by running:

“lsdsk –candidate”

• Create a XML config file to define the diskgroup e.g.

<dg name=”<dg name>” redundancy=”external”>

<dsk string=”<disk path>”

</dg>

• Login to ASM via ASMCMD and run:

“mkdg <config file>.xml”

• A successfully created diskgroup. This diskgroup can be viewed using the “lsdg” ASMCMD command.• The diskgroup will be registered as a Clusterware resource (crsctl stat res –t)


Test 3	Create a normal or high redundancy ASM diskgroup using ASMCMD	• Identify the candidate disks forthe diskgroup by running: “lsdsk –candidate” • Create a XML config file to define the diskgroup e.g. <dg name=”<dg_name>” redundancy=”normal”> <fg name=”fg1″> <dsk string=”<disk path>” /> </fg> <fg name=”fg2″> <dsk string=”<disk path>” /> </fg> <a name=”compatible.asm” value=”11.1″/> <a name=”compatible.rdbms” value=”11.1″/> </dg> • Login to ASM via ASMCMD and run: “mkdg <config file>.xml”	• A successfully created diskgroup. Thisdiskgroup can be viewed using the “lsdg” ASMCMD command. • The diskgroup will be registered as a Clusterware resource (crsctl stat res –t)
Test 4	Add a disk to a ASM disk group using ASMCMD	• Identify the candidate disk tobe added by running: “lsdsk –candidate” • Create a XML config file to define the diskgroup change e.g. <chdg name=”<dg name>”> <add> <dsk string=”<disk path>”/> </add> </chdg> • Login to ASM via ASMCMD and run: “chdg <config file>.xml” NOTE: Progress can be monitored by running “lsop”	• The disk will be added to the diskgroup andthe data will be rebalanced evenly across all disks in the diskgroup. Progress of the rebalance can be monitored by running the “lsop” ASMCMD command.
Test 5	Drop an ASM disk from a diskgroup using ASMCMD	• Identify the ASM name for thedisk to be dropped from the given diskgroup: “lsdsk -G <dg name> -k • Create a XML config file to define the diskgroup change e.g. <chdg name=”<dg name>”> <add> <dsk name=”<disk name>”/> </add> </chdg> • Login to ASM via ASMCMD and run: “chdg <config file>.xml” NOTE: Progress can be monitored by running “lsop”	• The data from the removed disk will berebalanced across the remaining disks in the diskgroup. Once the rebalance is complete the disk will be listed as a candidate (lsdsk – candidate) to be added to another diskgroup. Progress can be monitored by running “lsop” • The diskgroup will be unregistered as a Clusterware resource (crsctl stat res –t)

Test 6

Modify rebalance power of an active operation using ASMCMD

• Add a disk to a diskgroup (asshown above).

• Identify the rebalance operation by running “lsop” via ASMCMD.

• Before the rebalance completes run the following command via ASMCMD:

“rebal –power <1-11> <dg name>.

NOTE: Progress can be monitored by running “lsop”

• The rebalance power of the current operationwill be increased to the specified value. This is visible with the lsop command.

Test 7

Drop a ASM diskgroup using ASMCMD

• Login to ASM via ASMCMDand run:

“dropdg <dg name>;”

• The diskgroup will be successfully dropped.• The diskgroup will be unregistered as a Clusterware resource (crsctl stat res –t)

以下使用的部分工具涉及到 oradebug 工具，如有疑问请及时咨询 oracle 工程师使用 oradebug 抓取数据库 hang 住操作指南

此部分介绍了 Slow Performance 问题的简单诊断。一般遵循的步骤为：

141sqlplus@coehq2	1125 15315 (TNS V1-V3)	SYS	4
147	575 10577	SCOTT		SQL*Plus

以下 SQL 可以定位消耗 DB Time 最高的 Session

— sessions with highest DB Time usage

SELECT s.sid, s.serial#, p.spid as “OS PID”, s.username, s.module, st.value/100 as “DB Time (sec)”

, stcpu.value/100 as “CPU Time (sec)”, round(stcpu.value / st.value * 100,2) as “% CPU”

FROM v$sesstat st, v$statname sn, v$session s, v$sesstat stcpu, v$statname sncpu, v$process p

WHERE sn.name = ‘DB time’ — CPU AND st.statistic# = sn.statistic# AND st.sid = s.sid

AND sncpu.name = ‘CPU used by this session’ — CPU AND stcpu.statistic# = sncpu.statistic#

AND stcpu.sid = st.sid AND s.paddr = p.addr

AND s.last_call_et < 1800 — active within last 1/2 hour

AND s.logon_time > (SYSDATE – 240/1440) — sessions logged on within

4 hours

AND st.value > 0;

SID SERIAL# OS PID USERNAME MODULE

DB Time (sec) CPU Time (sec) % CPU

———- ———- ———— ——– ————————-

————— ————- ————– ———-

	141	1125 15315	SYS	sqlplus@coehq2 (TNS V1-
V3)		12.92		9.34 72.29

Gather Operating System (OS) Performance Data 使用 OS Watcher 收集性能问题时间段内的 OS 性能数据。使用 RDA 生成整体系统的详细报告

Event	Waits	Time(s)	Avg wait (ms)	% DB time	Wait Class
direct path read	1,771,336	127,013	72	32.66	User I/O
db file sequential read	2,467,303	118,254	48	30.41	User I/O
log file sync	260,061	45,419	175	11.68	Commit
DB CPU		26,005		6.69
read by other session	443,530	25,180	57	6.48	User I/O

Function Name	Reads: Data	Reqs per sec	Data per sec	Writes: Data	Reqs per sec	Data per sec	Waits: Count	Avg Tm(ms)
Direct Reads	3.4T	1010.15	984.709	2.4G	2.81	.674200	0
Buffer Cache Reads	29.3G	790.53	8.27680	0M	0.00	0M	2617.2K	48.11
Direct Writes	10.4G	2.95	2.93477	14.1G	16.48	3.97571	0
DBWR	0M	0.00	0M	2.3G	60.12	.640007	0
Others	984M	6.95	.271334	659M	1.18	.181717	26.4K	50.61
LGWR	95M	1.68	.026195	1.3G	38.66	.361503	72.5K	50.06
Streams AQ	0M	0.00	0M	0M	0.00	0M	11	54.09
TOTAL:	3.4T	1812.28	996.218	20.7G	119.26	5.83314	2716.1K	48.19

physical reads direct	454,503,809	125,327.88	1,520.63
physical reads direct (lob)	35,116	9.68	0.12
physical reads direct temporary tablespace	1,478,297	407.64	4.95

Owner	Tablespace Name	Object Name	Subobject Name	Obj. Type	Direct Reads	%Total
PBMS	PBMS_DATA07	PBMS_SYNEST_INF_REC1	P19	TABLE PARTITION	90,038,967	19.81
PBMS	PBMS_DATA01	PBMS_SYNEST_INF_REC1	P01	TABLE PARTITION	81,899,865	18.02


PBMS	PBMS_DATA05	PBMS_SYNEST_INF_REC1	P17	TABLE PARTITION	53,098,234	11.68
PBMS	PBMS_DATA06	PBMS_SYNEST_INF_REC1	P06	TABLE PARTITION	39,525,197	8.70
PBMS	PBMS_DATA09	PBMS_SYNEST_INF_REC1	P21	TABLE PARTITION	37,971,246	8.35

Function Name	Reads: Data	Reqs per sec	Data per sec	Writes: Data	Reqs per sec	Data per sec	Waits: Count	Avg Tm(ms)
Direct Reads	289.8G	424.49	126.774	2G	7.78	.883528	0
Buffer Cache Reads	201.7G	5328.22	88.2310	0M	0.00	0M	11.6M	6.40
Direct Writes	3.3G	1.45	1.43936	126.3G	378.81	55.2606	0
DBWR	0M	0.00	0M	3.6G	164.82	1.57821	5	6.20
LGWR	63M	1.75	.026916	2.3G	247.86	1.00571	224.8K	2.33
Others	1G	6.68	.457999	912M	1.35	.389641	15.9K	9.15
Streams AQ	1M	0.01	.000427	0M	0.00	0M	27	23.63
TOTAL:	495.9G	5762.60	216.930	135.1G	800.61	59.1177	11.9M	6.32

2012-06-14 14:38:00.991: [ CRSRES][1403169088] Attempting to stop

`ora.zjhz-bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac01` 2012-06-14 14:38:01.005: [ CRSRES][1407371584] startRunnable: setting

CLI values

2012-06-14 14:38:01.016: [ CRSRES][1409472832] startRunnable: setting

CLI values

2012-06-14 14:38:01.020: [ CRSRES][1407371584] Attempting to start

`ora.gwmngdb.gwmngdb3.inst` on member `zjhz-bjiagw-mdsp-rac03` 2012-06-14 14:38:01.023: [ CRSRES][1409472832] Attempting to start

`ora.zjhz-bjiagw-mdsp-rac03.ASM3.asm` on member `zjhz-bjiagw-mdsp- rac03`

2012-06-14 14:38:01.256: [ CRSRES][1403169088] Stop of `ora.zjhz-

bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac01` succeeded. 2012-06-14 14:38:01.270: [ CRSRES][1403169088] startRunnable: setting

CLI values

2012-06-14 14:38:01.270: [ CRSRES][1403169088] Attempting to start

`ora.zjhz-bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac03` 2012-06-14 14:38:05.834: [ CRSRES][1403169088] Start of `ora.zjhz-

bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac03` succeeded. 2012-06-14 14:38:05.864: [ CRSRES][1403169088] startRunnable: setting

CLI values

2012-06-14 14:38:05.870: [ CRSRES][1403169088] Attempting to start

`ora.zjhz-bjiagw-mdsp-rac03.LISTENER_ZJHZ-BJIAGW-MDSP-RAC03.lsnr` on member `zjhz-bjiagw-mdsp-rac03`

2012-06-14 14:38:06.157: [ CRSRES][1440954688] CRS-1002: Resource

‘ora.zjhz-bjiagw-mdsp-rac03.vip’ is already running on member ‘zjhz- bjiagw-mdsp-rac03’

2012-06-14 14:38:07.410: [ CRSAPP][1407371584] StartResource error

for ora.gwmngdb.gwmngdb3.inst error code = 1

2012-06-14 14:38:08.925: [ CRSRES][1407371584] Start of

`ora.gwmngdb.gwmngdb3.inst` on member `zjhz-bjiagw-mdsp-rac03` failed. 2012-06-14 14:38:09.264: [ CRSRES][1403169088] Start of `ora.zjhz-

bjiagw-mdsp-rac03.LISTENER_ZJHZ-BJIAGW-MDSP-RAC03.lsnr` on member

`zjhz-bjiagw-mdsp-rac03` succeeded.

2012-06-14 14:38:09.952: [ CRSRES][1405270336] CRS-1002: Resource

‘ora.zjhz-bjiagw-mdsp-rac03.LISTENER_ZJHZ-BJIAGW-MDSP-RAC03.lsnr’ is already running on member ‘zjhz-bjiagw-mdsp-rac03’

2012-06-14 14:38:10.939: [ CRSRES][1405270336] startRunnable: setting

CLI values

2012-06-14 14:38:10.957: [ CRSRES][1405270336] Attempting to start

`ora.zjhz-bjiagw-mdsp-rac03.ons` on member `zjhz-bjiagw-mdsp-rac03` 2012-06-14 14:38:12.433: [ CRSRES][1405270336] Start of `ora.zjhz-

bjiagw-mdsp-rac03.ons` on member `zjhz-bjiagw-mdsp-rac03` succeeded. 2012-06-14 14:38:13.462: [ CRSRES][1409472832] Start of `ora.zjhz-

bjiagw-mdsp-rac03.ASM3.asm` on member `zjhz-bjiagw-mdsp-rac03` succeeded.

2012-06-14 14:38:13.463: [ CRSRES][1409472832] Skip online resource:

ora.zjhz-bjiagw-mdsp-rac03.ons

2012-06-14 14:38:13.481: [ CRSRES][1411574080] startRunnable: setting

CLI values

2012-06-14 14:38:13.484: [ CRSRES][1411574080] Attempting to start

`ora.zjhz-bjiagw-mdsp-rac03.gsd` on member `zjhz-bjiagw-mdsp-rac03` 2012-06-14 14:38:13.794: [ CRSRES][1411574080] Start of `ora.zjhz-

bjiagw-mdsp-rac03.gsd` on member `zjhz-bjiagw-mdsp-rac03` succeeded.

	Instance name: dsjz2
	Redo thread mounted by this instance: 2
	Oracle process number: 458
	Unix process pid: 197620, image: oracle@zjjdjz02 (J070)
	*** 2012-08-25 10:51:51.702
	>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
	row cache enqueue: session: 700001b89af6180, mode: N, request: S
	……
	row cache enqueue: count=1 session=700001b8bae2ac0
	object=70000195d71c300, request=S
	savepoint=0x116904
	row cache parent object: address=70000195d71c300
	cid=11(dc_object_ids)
	PROCESS 458:
	—————————————-
	SO: 700001b8c6c5d80, type: 2, owner: 0, flag: INIT/-/-/0x00
	(process) Oracle pid=458, calls cur/top:
	70000184e9eae68/700001876fa1258, flag: (0) –
	int error: 0, call error: 0, sess error: 0, txn error 0

ZJHZ-BJIAGW-MDSP-RAC02:oracle:gwmngdb2 > crs_stat -t Name Type Target State Host————————————————————

ora.gwmngdb.db application ONLINE ONLINE zjhz…ac01 ora….b1.inst application ONLINE ONLINE zjhz…ac01 ora….b2.inst application ONLINE ONLINE zjhz…ac02 ora….b3.inst application ONLINE OFFLINE ora….SM1.asm application ONLINE ONLINE zjhz…ac01 ora….01.lsnr application ONLINE ONLINE zjhz…ac01 ora….c01.gsd application ONLINE ONLINE zjhz…ac01 ora….c01.ons application ONLINE ONLINE zjhz…ac01 ora….c01.vip application ONLINE ONLINE zjhz…ac01 ora….SM2.asm application ONLINE ONLINE zjhz…ac02 ora….02.lsnr application ONLINE ONLINE zjhz…ac02 ora….c02.gsd application ONLINE ONLINE zjhz…ac02 ora….c02.ons application ONLINE ONLINE zjhz…ac02 ora….c02.vip application ONLINE ONLINE zjhz…ac02 ora….SM3.asm application ONLINE OFFLINE ora….03.lsnr application ONLINE OFFLINE ora….c03.gsd application ONLINE OFFLINE ora….c03.ons application ONLINE OFFLINE

ora….c03.vip application

2012-06-14 14:25:14.269: [ CRSRES][1415764288] Attempting to start`ora.zjhz-bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac01`

2012-06-14 14:25:14.624: [ CRSRES][1415764288] Start of `ora.zjhz-bjiagw-mdsp-rac03.vip` on member `zjhz-bjiagw-mdsp-rac01` succeeded. 2012-06-14 14:25:14.637: [ CRSEVT][1405270336] Post recovery done

evmd event for: zjhz-bjiagw-mdsp-rac03

2012-06-14 14:25:14.637: [ CRSD][1405270336] SM: recoveryDone: 0