Grid Control OMS Agent代理工作原理图

我们在使用Grid Control集中化管理OS、Oracle数据库时要求在host上安装Agent代理程序,以便Agent定期收集OS、Oracle信息传输给Oracle Grid Control Management Server(OMS),并执行OMS下达的一系列指令。

大多数人对于Agent的了解仅限于如何安装和启动agent,下图展示了OMS Agent的架构:

 

 

Agent主要由2个组件(component)部分组成,分别是Collector 收集器和 Metric Engine 度量引擎。

 

Collector收集器是agent的重要子系统。它负责收集并上传metric data度量数据到OMS(oms最终将这些数据存入数据库中)。Collector 利用collection file中的信息判定针对哪些target目标需要收集metric data以及多久收集一次。 为了获取数据,Collector将查询投递给Metric Engine,而Metric Engine负责实际的metric data的收集。 Metric Engine  通过Fetchlets 、Metadata原信息文件(Metadata files defined in OH/sysman/admin/metadata)和 已发现的target 信息文件(Targets defined in OH/sysman/emd/targets.xml)来获得每一个目标的metrics监控信息。 同时 metadata原信息文件也提供了实际如何去计算metrics度量的算法。

 

基于以上这些信息,Metric Engine 将使用恰当的fetchlets从监控目标获取数据, 这里的 Fetchlets指的是指定数据的访问方式, 例如访问数据库性能数据会采用SQL Fetchlets,而访问OS数据则使用OS Fetchlets。

 

一旦Collector 收集到metric data,它会将这些度量数据和已定义的阀值做对比,检查是否发送警告(alert waring), 同时将这些度量信息保存到本地文件系统上($OH/sysman/emd/upload目录)。 这些文件最后通过http 或 https 协议 传送到OMS服务器的指定URL上,该URL被$OH/sysman/config/emd.properties 配置文件中的REPOSITORY_URL指定,如以下例子:

 

 

[root@nas ~]# cat /w01/wls/agent/core/12.1.0.1.0/stage/sysman/config/emd.properties
#
#   emd Root directory(read-only location). Metrics should not create files
#   under this directory
#
#
emdRoot=/w01/wls/agent/core/12.1.0.1.0

#
#   agent Root directory(writeable).s
#   Use this property to base any temporary file creation.
#
#
agentStateDir=%EMSTATE%

#  perl executable directory  
#
perlBin=/w01/wls/agent/core/12.1.0.1.0/perl/bin

#
# script directory
#
scriptsDir=/w01/wls/agent/core/12.1.0.1.0/sysman/admin/scripts

#
# stage directory for provisioning
#
emStageDir=/tmp

#
#  EMD main servlet URL
#
EMD_URL=http://nas:%EM_SERVLET_PORT%/emd/main/

#
#  OMS Upload URL
#
#  if there is no receiving OMS or if you wish to disable the UploadManager
#  please set this value to empty or comment out below line
#
REPOSITORY_URL=https://:4900/empbs/upload/

#
#The following properties are advanced read-only properties
#

#
# The location of the file that contains the root certificate.
#
emdRootCertLoc=/w01/wls/agent/core/12.1.0.1.0/sysman/config/b64LocalCertificate.txt
internetCertLoc=/w01/wls/agent/core/12.1.0.1.0/sysman/config/b64InternetCertificate.txt

#
# The download URL for the EMD Oracle Wallet and its local file location.
#
# Note: Ensure that this URL references a valid port number at which the
# console is available on http
#
emdWalletSrcUrl=https://:4900/em/wallets/emd
emdWalletDest=/w01/wls/agent/core/12.1.0.1.0/sysman/config/server

# JAVA HOME required for agent operations
#
JAVA_HOME=/w01/wls/agent/core/12.1.0.1.0/jdk

#
# This string is used by the agent to determine which algorithm to use for encrypted data
# The string value will be same as the release version
#
agentVersion=12.1.0.1.0

#
# To enable the metric browser, uncomment the following line
# This is a reloadable parameter
#
#_enableMetricBrowser=true

#
# These are the optional Java flags for the agent
#
agentJavaDefines=-Xmx128m

#
#   The agent base directory.
#
agentBaseDir=/w01/wls/agent

#
############################################################################
########################### Modifiable Properties ##########################
############################################################################
#

#
#### Tracing related properties
#

#
# emagent perl tracing levels
# supported levels: DEBUG, INFO, WARN, ERROR
# default level is WARN
#
#
EMAGENT_PERL_TRACE_LEVEL=INFO

# logging properties
Logger.log4j.appender.Rolling=org.apache.log4j.RollingFileAppender
Logger.log4j.appender.Rolling.File=%EMSTATE%/sysman/log/gcagent.log
Logger.log4j.appender.Rolling.Append=true
Logger.log4j.appender.Rolling.MaxFileSize=5000000
Logger.log4j.appender.Rolling.MaxBackupIndex=10
Logger.log4j.appender.Rolling.layout=oracle.sysman.gcagent.util.logging.GCPattern
# FOR NOW add a nother log for errors
Logger.log4j.appender.Errors=org.apache.log4j.RollingFileAppender
Logger.log4j.appender.Errors.File=%EMSTATE%/sysman/log/gcagent_errors.log
Logger.log4j.appender.Errors.Append=true
Logger.log4j.appender.Errors.Threshold=ERROR
Logger.log4j.appender.Errors.layout=oracle.sysman.gcagent.util.logging.GCPattern
Logger.log4j.appender.Errors.MaxFileSize=50000000
Logger.log4j.appender.Errors.MaxBackupIndex=3
# Add a test appender for individual tests
Logger.log4j.appender.Test=org.apache.log4j.FileAppender
Logger.log4j.appender.Test.File=/dev/null
Logger.log4j.appender.Test.Append=true
Logger.log4j.appender.Test.Threshold=DEBUG
Logger.log4j.appender.Test.layout=oracle.sysman.gcagent.util.logging.GCPattern

#
# If you increase the maximum file size for the Mdu and Errors logs, you
# should consider setting _maxFileSizeToCopy to a value that is higher then the
# new number (please note that this will potnetially increase the size of your
# incidents)
#

#
# Set root category priority to INFO and its only appender to Rolling.
Logger.log4j.rootCategory=INFO, Rolling, Errors, Test

#
# Enable HTTPListener (jetty) at INFO level.
# TODO: remove this when true trace is supported
Logger.log4j.category.oracle.sysman.gcagent.comm.agent.http.HTTPListener=INFO

Logger.log4j.appender.stdout=org.apache.log4j.ConsoleAppender
Logger.log4j.appender.stdout.layout=oracle.sysman.gcagent.util.logging.GCPattern

# Set the class loaders to level INFO
Logger.log4j.category.oracle.sysman.gcagent.metadata.impl.ChainedClassLoader=INFO
Logger.log4j.category.oracle.sysman.gcagent.metadata.impl.ReverseDelegationClassLoader=INFO
Logger.log4j.category.oracle.sysman.gcagent.metadata.impl.PluginLibraryClassLoader=INFO
Logger.log4j.category.oracle.sysman.gcagent.metadata.impl.PluginClassLoader=INFO

# Add an appender for MetaData Updates
Logger.log4j.appender.Mdu=org.apache.log4j.RollingFileAppender
Logger.log4j.appender.Mdu.File=%EMSTATE%/sysman/log/gcagent_mdu.log
Logger.log4j.appender.Mdu.Append=true
Logger.log4j.appender.Mdu.Threshold=INFO
Logger.log4j.appender.Mdu.layout=org.apache.log4j.PatternLayout
Logger.log4j.appender.Mdu.layout.ConversionPattern=%d [%t] - %m%n
Logger.log4j.appender.Mdu.MaxFileSize=50000000
Logger.log4j.appender.Mdu.MaxBackupIndex=3

Logger.log4j.category.oracle.sysman.gcagent.dispatch.MetadataUpdater=INFO, Mdu
Logger.log4j.additivity.oracle.sysman.gcagent.dispatch.MetadataUpdater=false

# Turn off QA log by default
Logger.log4j.category.QA=FATAL, QA

#Logger._enableTrace=true

#
#### Scalability related properties
#

#List of ora errors which can be ignored and need not be uploaded to repos
IgnoreDownOraErrors=12541,01033,01034,12505,03134,12170,12500,01219,1089,12560,12514,12528,12545

################################
#
# Put all additional properties here
#
################################

# uncomment for ease of debugging
#MaxThreads=1

# Set the server's graceful shutdown delay.
GracefulShutdownDelay=3

# Dump the dispatcher when overloaded
_dumpDispatcherWhenOverloaded=true

# Whether the EMD should listen on all NICs on the current host (the default)
# or just the NIC associated with the hostname in EMD_URL
AgentListenOnAllNICs=true

# Dump each request
#_dumpEveryDispatcherRequest=true

# Dynamic properties timeout for specific target types
dynamicPropsComputeTimeout_rac_database=180
dynamicPropsComputeTimeout_cluster=180
dynamicPropsComputeTimeout_has=180
dynamicPropsComputeTimeout_oracle_database=180
dynamicPropsComputeTimeout_oc4jjvm=180
dynamicPropsComputeTimeout_microsoft_sqlserver_database=180
dynamicPropsComputeTimeout_host=180
dynamicPropsComputeTimeout_osm_instance=180

_disableLoadDPFromCacheNormal=true

#Enable jobsystem streams tracing
_enableJobSystemStreamsTracing=true

# Allow beacon aplication to have 500 megabytes of space. Primarily for ATS collections.
# 500 * 1024 * 1024 = 524288000
applicationMetadataQuota_BEACON=524288000

#Enable auto tuning out of the box
enableAutoTuning=true

 

由Collector最终收集到的这些信息文件仅在满足以下任意条件时实际传送给OMS:

1) 有一条alert告警信息需要发送
2) Collector收集到的信息文件的大小超过一个预定值(默认为20MB 20480KB), 该预定限制值由$OH/sysman/config/emd.properties中UploadFileSize参数指定。
3) 从上一次数据加载算起时间超过30分钟(默认),该预订限制值由$OH/sysman/config/emd.properties中UploadInterval 参数指定。

 

注意与Agent的处理方式不同,由Agent发送给OMS的Alert severities告警信息,OMS会直接将其存入到EM Repository数据库中,而不是以临时文件的形式暂存。

 

Agent除了Metric Engine和Collector 2个主要模块外, 还有其他子系统负责完成不同的工作:

 

  • Target Manager
    • Target Manager holds monitored targets
    • Target data in $EM/sysman/emd/targets.xml
    • lists managed targets, each with name, type, and other properties
    • Credential properties are encrypted
    • Targets can be marked broken
      • Required properties not provided
      • Dynamic properties take too long to compute
    • Discovery of new target instances possible by running perl scripts that list unmonitored instances.
  • Metric Engine
    • Driven by XML target metadata
    • one file per target-type, found in $OH/sysman/admin/metadata/*.xml
    • defines metrics; each may have multiple columns
    • for each metric, defines how data is collected:
      • QueryDescriptor : by fetchlet
      • PushDescriptor: by recvlet
      • ExecutionDescriptor: aggregation from other metrics
    • Supports multiple target versions with ValidIf
    • Defines properties for target type
      • Instance properties: specified in targets.xml
      • Dynamic properties: computed by metric engine
    • Metric Engine holds target-type metadata
      • given a target and a metric name, calls fetchlet manager and/or metric cache and returns a metric result
    • Metric Cache caches last-collected data for use in computing expressions
    • Aggregate metric support allows metrics to be computed via views, joins and group bys over other metrics
      • GetView: select columns or rows from a MetricResult
      • GroupBy: compute aggregation information (SUM, COUNT, MIN, MAX)
      • Union: add rows returned by multiple MetricResults
      • JoinTables: combine multiple metrics’ columns
  • Fetchlet Manager
    • A fetchlet is a data-access mechanism available to compute metric data
      • OS fetchlets : launch an OS process and interpret output
        • OS Fetchlet
        • OSLine Fetchlet
        • OSLineToken
        • UDM : User Defined Metric
      • SQL fetchlet : run a SQL or PL/SQL statement
      • URL fetchlets
        • HTTP data
        • URLTiming Fetchlet
      • and more…
  • Collection Manager
    • Holds all collections, both default and per-target
    • CollectionItem is the basic unit of scheduled collection
    • multiple metrics collected from the same target at the same interval can be collected in the same thread (MetricColl)
    • Once data is collected for a CollectionItem, any Conditions are evaluated
      • three states: Clear, Warning, Critical or Unknown
      • last evaluated Condition states are stored in $EM/sysman/emd/state/*
    • Collection XML files
      • default collections defined for all targets of a type in $OH/sysman/admin/default_collection/*.xml
      • additional collections for a particular target in $EM/sysman/emd/collection/*.xml
      • specifies, by metric, schedule for collection and thresholds to be applied to columns
  • Blackout Manager
    • Manage blackout information stored in $EM/sysman/emd/blackouts.xml
    • Scheduled collections consult Blackout Manager; if target is currently blacked-out, collection does not proceed
    • Targets may be affected by multiple blackouts; if any blackout is effective on a target, the target is blacked-out
    • Node blackouts affect all targets monitored by the agent
    • Blackouts file :
      • blackouts in $EM/sysman/emd/blackouts.xml
      • each blackout can be applied to one or more targets; if target is node, blackout applies to all targets
      • blackout can be immediate or scheduled; if scheduled, can be one-time or repeated
  • Scheduler
    • Schedules activities in order of next run time
      • multiple schedule formats:
        • Once: happens only once
        • Interval: happens every n minutes/hours/days
        • Week: happens on certain day of week
        • Month: happens on certain day of month
      • can specify begin time/end time
    • Spawns threads to do work whose time has arrived
    • Used by Collector and Blackout Manager
    • Health Monitor checks that the scheduler is doing its work
    • emctl status agent scheduler
      • Dumps out all the scheduled elements
  • Upload Manager
    • As data is collected by other agent components, serializes writing of  intermediary .dat files (stored in $AS/sysman/emd/upload)
    • .dat files merged into .xml files on five priority channels
    • XML files sent to OMS as HTTP requests
    • maintains statistics on pending xml files; will disable collections based on number of files, aggregate size of files, and percentage free disk space on upload filesystem
    • Upload interval dynamic, based on properties and previous upload status
  • Ping Manager
    • Periodically, sends HTTP heartbeat request to OMS and verifies response
    • OMS response dictates interval before next ping
    • Exchange timezone information
    • A successful ping from the agent to the OMS is required before any uploads will occur

Comments

  1. REPOSITORY_URL value is Changing Back to the OMS Name Instead of SLB Name.

    Applies to:
    Enterprise Manager Base Platform – Version: 11.1.0.1 and later [Release: 11.1 and later ]
    Information in this document applies to any platform.
    Symptoms
    Configured multiple 11.1.0.1 OMS behind Server Load Balancer (SLB) (as per Note 866732.1) and all OMSs are secured against SLB name using the following command.
    $cd /bin
    $./emctl secure oms -host myslb.em.com
    OR
    $./emctl secure oms -host [-slb_console_port ] [-slb_port ]

    Securing 11.1.0.1 agents against SLB name goes fine and agents upload as well.

    $/bin/emctl secure agent -emdWalletSrcUrl https://myslb.em.com:1159/em
    $
    /bin/emctl upload agent

    After secure has completed , verified ./emctl status agent and AGENT_HOME/sysman/config/emd.properties file , REPOSITORY_URL reverts back to one of the Physical OMS server name instead of SLB name.

    1. Securing the agent changes the REPOSITORY_URL value back to that of the OMS name.

    2. Changing the repository url to the load balancer then start the agent without re-securing it ,
    getting the error:
    Common Name = “usto-tapp-oem01.amgn.com” Does not Match Hostname = “usto-oem2.amgn.com”

    3. In another scenario the Agent log says the agent is blocked and needs to be resynced from the console.
    When re-sync is done from the console the REPOSITORY_URL value is getting changed to the OMS name.

    Example:
    Multi OMS setup, all the OMS’s secured with SLB Hostname using
    $ cd $OMS_HOME/bin
    $ ./emctl secure oms -host oem-prd.example.com -secure_port 1159 -slb_port 1159 -slb_console_port 443 -lock

    $ ./emctl status oms -details
    Oracle Enterprise Manager 11g Release 1 Grid Control
    Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
    Enter Enterprise Manager Root (SYSMAN) Password :
    Console Server Host : oem-prd01.exapmple.com
    HTTP Console Port : 7788
    HTTPS Console Port : 7799
    HTTP Upload Port : 4889
    HTTPS Upload Port : 1159
    SLB or virtual hostname: oem-prd.exapmple.com
    HTTPS SLB Upload Port : 1159
    HTTPS SLB Console Port : 7799
    Agent Upload is locked.
    OMS Console is unlocked.
    Active CA ID: 1
    oem-prd01 3:

    Once this is done secured the agent using:
    $ cd $AGENT_HOME/bin
    $ ./emctl secure agent -emdWalletSrcUrl https://oem-prd.exapmple.com:1159/em

    $ ./emctl status agent
    Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
    Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
    —————————————————————
    Agent Version : 10.2.0.5.0
    OMS Version : 11.1.0.1.0
    Protocol Version : 11.1.0.0.0
    Agent Home : /u01/oracle/oem/agent10g
    Agent binaries : /u01/oracle/oem/agent10g
    Agent Process ID : 4929
    Parent Process ID : 4922
    Agent URL : https://agent.exapmple.com:3872/emd/main/
    Repository URL : https://oem-prd01.exapmple.com:1159/em/upload
    Started at : 2011-03-21 16:56:53
    Started by user : oracle
    Last Reload : 2011-03-21 16:56:53
    Last successful upload : 2011-03-21 18:38:37
    Total Megabytes of XML files uploaded so far : 12.99
    Number of XML files pending upload : 0
    Size of XML files pending upload(MB) : 0.00
    Available disk space on upload filesystem : 66.32%
    Last successful heartbeat to OMS : 2011-03-21 18:38:24
    —————————————————————
    Agent is Running and Ready

    Cause
    OMS’s were not re-started after securing them with SLB Hostname.

    Solution
    1. Restart OMS
    $/bin
    $./emctl stop oms -all

    Wait for few min and after all the processes are stopped , start OMS
    $/bin
    $./emctl start oms
    $./emctl status oms -details

    2. At agent side , Backup and edit emd.properties file.
    REPOSITORY_URL=https://myslb.em.com:1159/em/upload
    emdWalletSrcUrl=http://myslb.em.com:4889/em/wallets/emd
    OR
    Re-secure all Management Agents to upload to SLB.
    $cd $AGENT_HOME/bin
    $./emctl secure agent -emdWalletSrcUrl https://oem-prd.exapmple.com:1159/em

    3. Start agent and verify the REPOSITORY_URL.
    It will show SLB name for REPOSITORY_URL .

    References
    NOTE:866732.1 – How to Configure 2 OMSs Using F5 BigIP SLB
    http://download.oracle.com/docs/cd/E11857_01/em.111/e16790/security3.htm#sthref255

  2. Communication: Agent to OMS Communication Fails if the Agent’s REPOSITORY_URL Parameter has Incorrect Value

    Applies to:
    Enterprise Manager Grid Control – Version: 10.1.0.2 to 10.2.0.4 – Release: 10.1 to 10.2
    Information in this document applies to any platform.
    Symptoms
    Agent upload to OMS fails with:

    cd /bin
    emctl upload agent
    Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
    Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
    EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..

    The ’emctl status agent’ command shows a backlog of files waiting to be uploaded and may not have any value for the ‘OMS Version’ field.
    Cause
    The REPOSITORY_URL parameter mentioned in /sysman/config/emd.properties file has incorrect value for the OMS hostname and upload port.

    Often, the current value of the REPOSITORY_URL is the one that it defaults to during installation – the http://agentmachine.domain:4889/em/upload – this is incorrect, the REPOSITORY_URL should be omsmachine.domain:4889/em/upload.

    Solution
    Tto identify the correct hostname and upload port details for the REPOSITORY_URL parameter, use the steps in Note 358953.1: What ports are used in communication between the Grid Control OMS and a Management Agent?

    Once, the hostname and upload port for the OMS are correctly identified follow the below steps on the Agent:

    – Stop the agent

    cd /bin
    emctl stop agent

    – Take a backup of the /sysman/config/emd.properties

    – Edit the emd.properties file and correct the REPOSITORY_URL parameter to have the correct hostname and port value, on which the OMS is configured.

    REPOSITORY_URL=http://omsmachine.domain:4889/em/upload/

    – Delete the /sysman/emd/lastupld.xml file.

    – Start the agent

    cd /bin
    emctl start agent

    – If the OMS is secured and locked (accepts uploads only in the https mode), then secure the Agent also. Refer:
    Note 428874.1: How to tell if the EM OMS is locked or unlocked?
    Note 283091.1: How To Secure / Unsecure The Grid Control Components (Agent / OMS) In 10g

    – Check the agent upload:

    cd /BIN
    emctl upload

    This should work fine now.
    References
    NOTE:362199.1 – Communication: Agent to OMS Communication Fails with “Common Name = “omsmachine.domain” Does not Match Hostname = “omsmachine” ” in the emagent.trc
    NOTE:729327.1 – How to ReConfigure 10g Agents if Wrong Management Service ‘Hostname’ or ‘Port’ Specified during Installation

  3. Problem: Agent, Host and Monitored Targets Do Not Appear in Grid Control Due to ‘REPOSITORY_URL’ Specifying Hostname That Is Not Resolved In DNS

    Applies to:
    Enterprise Manager Grid Control – Version: 10.1.0.3
    This problem can occur on any platform.
    Symptoms
    The Management Agent is installed on a remote node, during the install of the agent the following message was displayed:

    “The specified Management Service on host at is unreachable. Check the connection details for the Management Service to ensure the that the values you entered for host name correctly”

    The install was continued, however, after the install completed the agent is not displayed as a target in the Grid Control.
    Cause

    The Management Service’s hostname cannot be resolved using DNS or the local etc/hosts file during the install.
    Solution

    Method 1 (Allow Installer to make changes):

    1. Uninstall the agent.
    2. Either configure DNS to resolve the name or configure the ‘etc/hosts’ file to resolve the ip address fully qualified domain name (FQDN) and shortname of the OMS server.
    3. Reinstall the agent

    Method 2 (Manual):

    1. Either configure DNS to resolve the name or configure the ‘etc/hosts’ file to resolve the ip address fully qualified domain name (FQDN) and shortname of the OMS server.
    2. Modify the emd.properties file (located in the $AGENT_HOME/sysman/config directory) and ensure that the REPOSITORY_URL is correctly defined for the OMS server and the EMD_URL is correctly defined for the host running the Management agent.
    3. Delete the runtime and any pending upload files from the agent home:

    rm -r $AGENT_HOME/sysman/emd/state/*
    rm -r $AGENT_HOME/sysman/emd/collection/*
    rm -r $AGENT_HOME/sysman/emd/upload/*
    rm $AGENT_HOME/sysman/emd/lastupld.xml
    rm $AGENT_HOME/sysman/emd/agntstmp.txt
    rm $AGENT_HOME/sysman/emd/blackouts.xml
    rm $AGENT_HOME/sysman/emd/protocol.ini

    NOTE: Normally you should not remove these files. However, since this is a new install and the agent has not communicated to the OMS, the files can safely be removed.

    4. Restart the agent.

    ’emctl start agent’

  4. Problem: Agent, Host and Monitored Targets Do Not Appear in Grid Control Due to ‘REPOSITORY_URL’ Specifying The Wrong Hostname

    Applies to:
    Enterprise Manager Grid Control – Version: 10.1.0.2 to 10.2.0.3
    This problem can occur on any platform.
    Symptoms

    After installation of Oracle Management Agent on a host, the host is not discovered by Grid Control.

    The following error is shown in emagent.trc

    2005-06-01 14:20:06 Thread-1 ERROR pingManager: nmepm_pingReposURL: Cannot connect to http://somehost:4889/em/upload/

    Cause
    This issue can occur if during the installation of the Agent, the hostname of the Agent is specified instead of the hostname of the OMS, when the OUI prompts for the OMS hostname.

    The OUI takes the value specified during the install process and uses it to configure the Agent’s files.

    Solution
    To implement the solution, please execute the following steps:

    1. Edit the file /sysman/config/emd.properties and change the parameter to refer to the hostname of the OMS. The parameter is:

    REPOSITORY_URL=

    2. Issue the command:

    /bin/emctl reload

    This will force the agent to attempt a new upload to the URL specified in the REPOSITORY_URL parameter.

    3. Verify the upload was successful by using the command;

    /bin/emctl status agent

  5. Successive Metadata Upload Requests Have Failed.”Agent should retry later.ORA-20609: Erro”r obtaining upload lock on emd_url

    Applies to:
    Enterprise Manager Grid Control – Version: 10.2.0.5 and later [Release: 10.2 and later ]
    Information in this document applies to any platform.
    Symptoms
    Multi 10.2.0.5.2 OMS with 11.1.0.7.0 Repository Database.
    Host is shown as Agent unreachable and noticed the following erorr message seen in
    OMS_HOME/sysman/emoms.trc file
    “9 successive metadata upload requests have failed. Last metadata upload error is OMS already processing a Metadata or Severity file from this Agent. Agent should retry later.Error obtaining upload lock in OMS on emd_url:http://agent_name.domain:3872/emd/main/
    Metric=Consecutive metadata uploagent_name.domainad failure count ”

    10.2.0.5 RAC Agents upload is failing with following message:
    2010-06-21 17:57:14,010 Thread-3751730064 ERROR upload: Failed to upload file B0004370.xml: HTTP error.
    Response received: ERROR-400|OMS already processing a Metadata or Severity file from this Agent. Agent should retry later.ORA-20609: Error obtaining upload lock on emd_url:http://agent_name.domain:3872/emd/main/
    ORA-06512: at “SYSMAN.EMD_LOADER”, line 4564
    ORA-06512: at line 1

    From OMS_HOME/sysman/log/emoms.trc
    2010-06-18 22:32:12,714 [AJPRequestHandler-ApplicationServerThread-31] ERROR eml.FxferRecv doGet.969 – Agent with url http://agent_name.domain:3872/emd/main/ is out-of-sync with repository (2/20) !
    2010-06-18 23:21:53,234 [AJPRequestHandler-ApplicationServerThread-12] ERROR eml.FxferRecv doGet.969 – Agent with url http://agent_name.domain:3872/emd/main/ is out-of-sync with repository (2/5) !

    Noticed that Deadlock on rep DB :
    From alert_oemdb2.log
    Mon Jun 21 18:56:10 2010
    Global Enqueue Services Deadlock detected. More info in file
    /oracle/logfiles/diag/rdbms/oemdb/oemdb2/trace/oemdb2_lmd0_2543.trc.
    Mon Jun 21 18:59:37 2010
    Incremental checkpoint up to RBA [0x8f9b.15102.0], current log tail at RBA
    [0x8f9b.1d211.0]
    Mon Jun 21 19:02:30 2010
    AUD: Audit Commit Delay exceeded, written a copy to OS Audit Trail

    From oemdb2_lmd0_2543.trc
    valblk : 0x3a423135202c204e554d5f4348494c44 :B15 , NUM_CHILD
    DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK
    possible owner[119.3018] on resource TX-002E0004-000288DB

    *** 2010-06-21 18:56:09.611
    Submitting asynchronized dump request [28]
    Global blockers dump end:———————————–
    Global Wait-For-Graph(WFG) at ddTS[0.355] :
    BLOCKED 0x3501a11d0 5 wq 2 cvtops x1 TX 0x270005.0x54af0
    [3D000-0002-00000326] 1
    BLOCKER 0x3501a1020 5 wq 1 cvtops x28 TX 0x270005.0x54af0
    [77000-0002-00000054] 1
    BLOCKED 0x351575af8 5 wq 2 cvtops x1 TX 0x2e0004.0x288db
    [77000-0002-00000054] 1
    BLOCKER 0x350b85748 5 wq 1 cvtops x28 TX 0x2e0004.0x288db
    [3D000-0002-00000326] 1
    * Cancel deadlock victim lockp 0x3501a11d0
    kjddt2vb: valblk [0.357] > local ts [0.356]
    *********************

    From oemdb2_lmd0_2543.trc
    *** 2010-06-18 23:26:44.403
    user session for deadlock lock 0x353a51d90
    sid: 3250 ser: 6 audsid: 6956551 user: 53/SYSMAN flags: 0x8000045
    pid: 55 O/S info: user: oracle, term: UNKNOWN, ospid: 4573
    image: oracleorkxpoedb02.esp.aur.national.com.au
    client details:
    O/S info: user: , term: , ospid: 1234
    machine: OMS_name.domainname program: OMS
    client info: OMS_Name.somain:4889_Management_Service
    application name: OEM.SystemPool, hash value=2960518376
    action name: JobDispatcher, hash value=875884737
    current SQL:
    UPDATE MGMT_JOB_HISTORY SET JOB_ID=:B29 , EXECUTION_ID=:B28 , STEP_ID=:B27 ,
    SOURCE_STEP_ID=:B26 , ORIGINAL_STEP_ID=:B25 , RESTART_MODE=:B24 ,
    STEP_NAME=:B23 , STEP_TYPE=:B22 , COMMAND_TYPE=:B21 , ITERATE_PARAM=:B20 ,
    ITERATE_PARAM_INDEX=:B19 , PARENT_STEP_ID=:B18 , STEP_STATUS=:B17 ,
    STEP_STATUS_CODE=:B16 , STEP_STATUS_CODE_CATEGORY=:B15 , NUM_CHILDREN=:B14 ,
    NUM_CHILDREN_COMPLETED=:B13 , OUTPUT_ID=:B12 , ERROR_ID=:B11 ,
    START_TIME=:B10 , END_TIME=:B9 , SEQUENCE_NUMBER=:B8 , DISPATCHER_ID=:B7 ,
    OMS_NAME=:B6 , STATUS_DETAIL= :B5 , STOP_START_TIME=:B4 ,
    STOP_LAST_SCHEDULED_TIME= :B3 , STOP_ERROR_ID=:B2 , STOPPED_BY=:B1 WHERE
    STEP_ID=:B27
    .
    *** 2010-06-19 05:45:03.046
    user session for deadlock lock 0x3515707b8
    sid: 3194 ser: 4582 audsid: 6958621 user: 53/SYSMAN flags: 0x8000045
    pid: 29 O/S info: user: oracle, term: UNKNOWN, ospid: 10408
    image: oracleorkxpoedb02.esp.aur.national.com.au
    client details:
    O/S info: user: , term: , ospid: 1234
    machine: OMS_name.domain program: OMS
    client info: OMS_name.domain:4889_Management_Service
    application name: OEM.CacheModeWaitPool, hash value=796036576
    current SQL:

    DELETE FROM MGMT_JOB_EXECUTION WHERE STEP_ID = :B1

    Cause
    BUG 9850704 SUCCESSIVE METADATA UPLOAD FAILED ‘ORA-20609: ERROR OBTAINING UPLOAD LOCK ON EMD.
    BUG 9131512 DEADLOCK IN JOB ENGINE WHILE DELETING RECORD FROM MGMT_JOB_HISTORY.
    Solution
    1. Download Apply PSU3 PATCH 9282397 on top of 10.2.0.5 OMS.

    2. Download Apply Patch 9576271 (p9576271_102053_Generic.zip) on top of PSU3

    P.S: Refer the readme of patch before applying.

    References
    BUG:9850704 – SUCCESSIVE METADATA UPLOAD FAILED ‘ORA-20609: ERROR OBTAINING UPLOAD LOCK ON EMD
    NOTE:1078864.1 – 10.2.0.5.3 Grid Control Patch Set Update (PSU)

  6. Problem: Startup: Emctl Start Dbconsole Fails with Agent port missing in EMD_URL

    The information in this article applies to:

    Enterprise Manager for RDBMS – Version: 10.1.0.2
    Linux x86 SUSE SLES 8
    United Linux

    Symptoms
    Unable to start dbconsole

    1. emctl start dbconsole fails with error:
    Unable to determine local host from URL
    EMD_URL=http://myhost.us.com:/emd/main

    2. emagent.nohup log shows the following errors:

    Property ‘agentTZRegion’ is either missing or contains invalid value
    emagent now exiting abnormally – initialization failure. Consult ‘.trc’ and ‘.log’ files
    EMAgent exited at

    3. emd.properties file is missing two key parameters:
    a) EMD_URL should have a port
    b) agentTZRegion= is missing altogether (should be the last line of the file)

    4. dbca.log shows errors on reading /etc/services

    5. $ORACLE_HOME/install/portlist.ini shows empty value for Enterprise Manager Agent Port

    6. $ORACLE_HOME/host_sid/sysman/log contains no logs

    Changes
    The above errors occur after the initial install of a 10g DB with database monitoring set to standalone.

    Cause
    The /etc/services file has all ports assigned beginning with 1830 – 1849. The DB Console Management Agent tries to use port 1830 during installation. When it is unable to find an unused port in the /etc/services file, the installation does not complete successfully. This issue has been logged in Bug:3513603 EMCTL START DBCONSOLE FAILS WITH ERROR IN STANDALONE AGENT URL

    Fix

    1. Locate a free port for the DB Console Agent. Begin with port 1830 and move up one port at a time until you find a free port. To check for a free port, use the command:
    netstat -a | grep 1830
    netstat -a | grep 1831
    etc

    2. Edit the /etc/services file and comment out the line for ports 1830, 1831, … 1849. This will ensure subsequent 10g database creations on this box will succeed.

    3. Rerun emca with -r option (because the sysman repository has already been created)

    or

    Manually update the $ORACLE_HOME/sysman/config/emd.properties file and insert the unused port (from Step 1 above) into the EMD_URL parameter.
    For example: EMD_URL=http://myhost.domain.com:1830/emd/main

    4. Verify the $ORACLE_SID is set
    echo $ORACLE_SID

    5. Start the dbconsole. This step will start both the DB Control Application and the DB Control Agent.
    $ emctl start dbconsole

    References:
    @ BUG:3513603 – Emctl Start Dbconsole Fails With Error In Standalone Agent Url References BUG:3513603 – EMCTL START DBCONSOLE FAILS WITH ERROR IN STANDALONE AGENT URL

  7. Problem: Startup: Emctl Start Dbconsole Reports Problems with EMD_URL

    Applies to:
    Oracle Enterprise Manager – Version: 10.1.0.2 and later [Release: 10.1 and later ]
    Linux x86
    Symptoms
    emctl start dbconsole reports

    Unable to determine local host from URL EMD_URL=http://hostname:/emd/main
    Cause
    The servlet port definition in $ORACLE_HOME/hostname_sid/sysman/config/emd.properties file is incorrect
    Solution
    Set valid port defintion in $ORACLE_HOME/hostname_sid/sysman/config/emd.properties.

    Check file for EMD_URL definition and change port settings

    EMD_URL=http://hostname:port/emd/main

    Example

    EMD_URL=http://oracleserver.coracle.com:1830/emd/main

  8. Grid Control Target Maintenance: Steps to Diagnose Issues Related to “Agent Unreachable” Status

    Applies to:
    Enterprise Manager Base Platform – Version: 10.1.0.2 to 10.2.0.5 – Release: 10.1 to 10.2
    Information in this document applies to any platform.
    Enterprise Manager Grid Control – Version: 10.1.0.2 to 10.2.0.5 – Release: 10.1 to 10.2

    Purpose
    This document provides steps to diagnose issues where the Agent target is shown with “Agent Unreachable” status in its homepage.
    – In a 10.1 Grid Console, the Agent homepage can be accessed from Management System -> Agents -> click on the Agent name.
    – In a 10.2 Grid console, the Agent homepage can be accessed from the Setup -> Agents -> click on the Agent name.

    The Agent homepage may also show an error such as:

    Communication between the Oracle Management Service host to the Agent host is unavailable. Any functions or displayed information requiring this communication link will be unavailable. For example: deleting/configuring/adding targets, uploading metric data, or displaying Agent home page information such as Agent to Management Service Response Time (ms).

    Targets monitored by this Agent will also have a status: Agent Unreachable instead of the actual status of the target.

    Additional references:
    Note 1084777.1: Description of Important Communication Components in a 10g Enterprise Manager Grid Control Agent
    Note 1097545.1: Description of Important Java Threads in a 10g Grid Control Oracle Management Service (OMS)
    Last Review Date
    October 5, 2010
    Instructions for the Reader
    A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
    Troubleshooting Details
    Background

    Whenever the OMS has to update the Agent configuration or transmit information to the Agent regarding a change in monitoring for one of the targets, it must initiate contact with the Agent. During the initial Agent to OMS communication, the Agent uploads its ‘EMD_URL’ value which enables the OMS to initiate communications back to the Agent as needed. During subsequent communication attempts from the OMS to the Agent, if the connection cannot be established based on the Agent’s EMD_URL, the OMS will flag the Agent as ‘Unreachable’.

    Some reasons why the OMS needs to contact the Agent:

    Managed Target Configuration Change: If a target needs a configuration change (like a password change), this modified metadata is submitted by the administrator through the UI and is then transmitted to the Agent by the OMS.
    Adding / Removing Targets: If a target is no longer valid or a new target needs to be added, these targets changes need to be sent to the Agent
    Real-time Statistics: If a user selects real-time metric details, the OMS will contact the Agent to get the current metric data. For example, the ‘Top 10 Processes’ details in the Host Performance page of a Unix / Linux machine.
    Job Operations: All scheduled jobs and all job updates need to transmitted to the Agent
    OMS pings: If the OMS detects the Agent is no longer uploading Data and Severities, it will try to contact the Agent in two phases:
    a) Ping the EMD_URL
    b) Do an ICMP ping to the hostmachine, to see if the host is responsive.
    Blackout Operations: All scheduled blackouts and all blackout updates need to get sent to the Agent, etc.

    There are several reasons why the Management Agent will show a status of “Agent Unreachable”.
    1. The Agent is not running.
    2. The Agent cannot resolve the OMS hostname after the initial successful heartbeat.
    3. The Agent is running and has files to upload but cannot upload files to the OMS.
    4. The OMS has been locked down to receive only HTTPS connections from the Agents but this particular Agent is not configured for HTTPS communications.

    Troubleshooting Steps:
    1. Verify the Agent Status

    Login to the Agent machine and execute:

    cd /bin
    emctl status agent

    The output should resemble something similar to the following:

    Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
    Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
    —————————————————————
    Agent Version : 10.2.0.5.0
    OMS Version : 10.2.0.5.0
    Protocol Version : 10.2.0.5.0
    Agent Home : /home/em/oracle/gc102/agent10g
    Agent binaries : /home/em/oracle/gc102/agent10g
    Agent Process ID : 6560
    Parent Process ID : 6544
    Agent URL : https://agentmachine.domain:3872/emd/main/
    Repository URL : https://omsmachine.domain:1159/em/upload
    Started at : 2010-10-01 09:28:21
    Started by user : em
    Last Reload : 2010-10-01 14:47:45
    Last successful upload : 2010-10-05 11:34:47
    Total Megabytes of XML files uploaded so far : 201.65
    Number of XML files pending upload : 0
    Size of XML files pending upload(MB) : 0.00
    Available disk space on upload filesystem : 84.87%
    Last successful heartbeat to OMS : 2010-10-05 11:36:10
    —————————————————————
    Agent is Running and Ready

    Note:
    – If the first line in the output does not indicate the correct version or does not mention ‘Grid Control’, then you are checking the output for an Agent component part of the AS Control or the DB Control. This Agent cannot communicate with the OMS.
    – If the Agent is not running, you will need to start it using: emctl start agent

    2. Verify the Agent to OMS Communication

    If the Agent is running, force an upload by executing:

    cd /bin
    emctl upload

    Output should be similar to the following:

    Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
    Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
    —————————————————————
    EMD upload completed successfully

    If the uploads are failing, refer to Note 550617.1: How To Effectively Investigate & Diagnose 10g Grid Agent Upload Problems to the Oracle Management Service (OMS)

    3. Verify the OMS to Agent Communication

    Through the Grid Control UI, Check the Management System -> Agent homepage

    The Agent homepage will list the details about the last successful communication with the Agent and the current state of the Agent. If communication with the Agent is possible, the ‘Upload Now’ button will be enabled.
    You can also check real-time statistics. The easiest way to force a connection from the OMS to the Agent is by looking at realtime statistics. From the Management System tab -> Go to the Agent homepage -> ‘General’ section, select the host. On the host homepage, select the ‘Performance Tab’
    If the real-time statistics are available, the OMS is capable of communicating with the Agent.

    If the Agent is running but the homepage in the Grid Console shows that the Agent is unreachable, verify the OMS to Agent connectivity using the steps in Note 1088414.1: How to Troubleshoot Communication From the Oracle Management Service (OMS) to a Grid Agent in 10g Enterprise Manager Grid Control?

    If many or all Agents are shown as Unreachable, this could be a potential problem at the OMS / Repository Database end.

    – Download and install the latest EMDiag kit (repvfy tool) available in Note 421053.1. Execute:

    cd /bin
    repvfy verify
    repvfy verify -detail

    – Check for any errors reported for the EM-level DBMS_Job’s in the Setup > Management Services and Repository > Repository Operations page of the Grid Console. Refer to

    Note 1178258.1: Overview of the ‘Management Services and Repository’ / Monitor-the-Monitor (MTM) Pages in Grid Console
    Note 1164855.1: Overview of the 10g Grid Control Management Repository,
    Section: 3. MGMT_VIEW user, Repository Views and DBMS_Jobs

    – Check the OMS log/trace files to verify if the OMS is having any problems in staying Up.
    Note 1161003.1: Master Note for 10g Grid Control OMS Performance Issues

    Compare the EMD_URL that is shown in the Grid Console with that of the Agent running in the target machine. To get the full EMD_URL, logon as SYSMAN to the repository database using SQLPLUS, and issue the following SQL statement:

    SELECT emd_url FROM mgmt_targets
    WHERE target_name =’
    AND target_type = ‘host’;

    EMD_URL
    ———————————————————————–
    https://agentmachine.domain:3872/emd/main/

    This should match the value of the ‘Agent URL’ in the ’emctl status agent’ output at the Agent side. For example:

    Agent URL : https://agentmachine.domain:3872/emd/main/
    This value is stored in the /sysman/config/emd.properties file as the EMD_URL parameter.

    4. Obtain a Availability Dump of the Agent

    Using the EMDiag kit (repvfy tool), we can obtain a dump of the target availability details as stored in the Repository Database. For more details, refer to Note 399899.1: Grid Control Target Maintenance: Troubleshooting Script for Target Availability in Enterprise Manager Grid Control

    For the Agent target, execute:

    cd /bin
    repvfy dump availability -name agentmachine.domain:3872 -type oracle_emd

  9. Adnan Ismail says

    I have a question related to OMS 11g.
    I found a directory under ” sldl” i.e. /u01/app/Oracle/Middleware/oms11g/sldl
    This directory is having many sub directories listed below and all of them have multiple binary files which are generating on daily basis, when I open any of the file it is in binary format, It is eating up storage day by day, not able to find a single clue on Google and metalink, might be you can help.
    0 11 14 17 2 22 25 28 30 33 36 39 41 44 47 5 52 55 58 60 63 66 69 71 9
    1 12 15 18 20 23 26 29 31 34 37 4 42 45 48 50 53 56 59 61 64 67 7 72
    10 13 16 19 21 24 27 3 32 35 38 40 43 46 49 51 54 57 6 62 65 68 70 8
    [oms11g ] cd 11
    [oms11g ] ls -l
    total 228520
    -rw-r—– 1 oracle oinstall 2153 Jul 11 2011 A7C333CBD2F362FBE040130A18085150
    -rw-r—– 1 oracle oinstall 3502 Jul 11 2011 A7C333CBD6A262FBE040130A18085150
    -rw-r—– 1 oracle oinstall 1711 Jul 11 2011 A7C333CBD73862FBE040130A18085150
    -rw-r—– 1 oracle oinstall 47 Jul 11 2011 A7C333CBD74862FBE040130A18085150
    -rw-r—– 1 oracle oinstall 5658 Jul 11 2011 A7C333CBD91F62FBE040130A18085150
    -rw-r—– 1 oracle oinstall 2025 Jul 11 2011 A7C333CBD92F62FBE040130A18085150
    -rw-r—– 1 oracle oinstall 5346 Jul 11 2011 A7C39C8398AC68AEE040130A180861FE
    -rw-r—– 1 oracle oinstall 2865 Jul 11 2011 A7C39C8398F568AEE040130A180861FE
    -rw-r—– 1 oracle oinstall 3685 Jul 11 2011 A7C39C83990868AEE040130A180861FE
    -rw-r—– 1 oracle oinstall 9608 Jul 11 2011 A7C3D5BC500C2F9FE040130A18086A9E
    -rw-r—– 1 oracle oinstall 71262 Jul 11 2011 A7C41E110ED70525E040130A1808779A
    -rw-r—– 1 oracle oinstall 362291 Mar 15 14:22 B98A3724236ED113E040130A790E621A
    -rw-r—– 1 oracle oinstall 15613765 Mar 20 00:09 BB9446EC8203D3F5E040130A780E3319
    -rw-r—– 1 oracle oinstall 15516595 May 9 00:07 BF767E816E115356E040130A780E6148
    -rw-r—– 1 oracle oinstall 10320739 May 20 00:01 C039860CDC42B3D3E040130A780E7AEE
    -rw-r—– 1 oracle oinstall 10320739 Jun 17 00:04 C0A9F884325F1897E040130A790E553C
    -rw-r—– 1 oracle oinstall 37385816 Jun 16 00:19 C0A9F89484ED0D16E040130A790E559B
    -rw-r—– 1 oracle oinstall 37385816 May 30 00:04 C0A9F8982C041565E040130A780E32C9
    -rw-r—– 1 oracle oinstall 37385816 Jun 10 00:13 C0A9F8982EE02F43E040130A780E32CE
    -rw-r—– 1 oracle oinstall 15523588 Jun 12 00:29 C0A9F898DFD7804FE040130A780E32D2
    -rw-r—– 1 oracle oinstall 37385816 Jun 26 00:05 C2E50E34217E9ADAE040130A780E0F66
    -rw-r—– 1 oracle oinstall 16364981 Jul 16 00:05 C49EA2F7DAC8B8B2E040130AC91D628B

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号