【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘

【Maclean Liu技术分享】拨开Oracle CBO 优化器迷雾, 探究Histogram直方图之秘,讲座文档正式版已上传


http://t.askmac.cn/thread-2172-1-1.html

预计时长: 1.5个小时

适合参与成员: 对于性能调优和CBO优化器有兴趣的同学,或急于提升SQL调优技能的同学。

教学视频已上传 , 收看请猛击下面的地址:

 

 

 

【Maclean Liu技术分享】Histogram直方图技术演示脚本如下:

【Maclean Liu技术分享】Histogram直方图技术演示脚本.txt (9.93 KB, 下载次数: 65)

讲座材料presentation 当前正式版本下载:

【Maclean Liu技术分享】拨开Oracle CBO优化器迷雾,探究Histogram直方图之秘 20130429.pdf (1.11 MB, 下载次数: 3646)

 
 

 

Oracle CBO术语大集合

最近准备写点Histogram和density相关的文章,先把术语给大家理一理:

cardinality (CDN)
Legend
CBQT – cost-based query transformation
JPPD – join predicate push-down
OJPPD – old-style (non-cost-based) JPPD
FPD – filter push-down
PM – predicate move-around
CVM – complex view merging
SPJ – select-project-join
SJC – set join conversion
SU – subquery unnesting
OBYE – order by elimination
OST – old style star transformation
ST – new (cbqt) star transformation
CNT – count(col) to count(*) transformation
JE – Join Elimination
JF – join factorization
SLP – select list pruning
DP – distinct placement
qb – query block
LB – leaf blocks
DK – distinct keys
LB/K – average number of leaf blocks per key
DB/K – average number of data blocks per key
CLUF – clustering factor
NDV – number of distinct values
Resp – response cost
Card – cardinality
Resc – resource cost
NL – nested loops (join)
SM – sort merge (join)
HA – hash (join)
CPUSPEED – CPU Speed
IOTFRSPEED – I/O transfer speed
IOSEEKTIM – I/O seek time
SREADTIM – average single block read time
MREADTIM – average multiblock read time
MBRC – average multiblock read count
MAXTHR – maximum I/O system throughput
SLAVETHR – average slave I/O throughput
dmeth – distribution method
1: no partitioning required
2: value partitioned
4: right is random (round-robin)
128: left is random (round-robin)
8: broadcast right and partition left
16: broadcast left and partition right
32: partition left using partitioning of right
64: partition right using partitioning of left
256: run the join in serial
0: invalid distribution method
sel – selectivity
ptn – partition
adop Automatic degree of parallelism

TABLE: Table Name
ALIAS: Table Alias
QBS: Query Block Signature
#ROWS: Number of Rows
#BLKS: Number of Blocks
ARL: Average Row Length
COR: Cardinality Original
CRD: Cardinality Rounded
CCM: Cardinality Computed
CNA: Cardinality Non Adjusted

AVGLEN: Average Column Length
NDV: Number of Distinct Values
NULLS: Number of Nulls in Column
DEN: Column Density
MIN: Minimum Column Value
MAX: Maximum Column Value
TYPE: Histogram Type
#BKTS: Histogram Buckets
UNCOMPBKTS: Histogram Uncompressed Buckets   
ENDPTVALS: Histogram End Point Values 
OOR: Out-of-Range Predicate

TABLE: Table Name
ALIAS: Table Alias
INDEX: Index Name
QBS: Query Block Signature
LVLS: Index Levels
#LB: Number of Leaf Blocks
#DK: Number of Distinct Keys
LB/K: Average Number of Leaf Blocks Per Key
DB/K: Average Number of Data Blocks Per Key
CLUF: Clustering Factor
INDEX_COLS: Index Column Numbers

COST: Cost of the Join
CARD: Cardinality of the Join
BC: Best Cost
LINE#: Line Number in the 10053 Trace File Where Cost Value is Located
JOIN#: Join Number in the 10053 Trace File Associated With Key
STATUS: If Permutation was Computed for all Table Joins the Status = COMPL. If Not, status = ABORT
*: In ANY Column Indicates Value Not Found in File

Freq 频率直方图
HtBal 高度平衡直方图

 

 

关于 UNCOMPBKTS和ENDPTVALS

 

当直方图类型为frequency histograms( Histogram: Freq)时UncompBkts  等于统计信息中采样的总行数-NULLS(Card: Original- NULLS,因为dbms_stats默认是auto_sample_size采样,所以这栏其实是采样到的原始Card-NULLS), 而EndPtVals 等于bucket总数,或者说NDV,因为frequency histograms中 NDV=number of buckets 

当直方图类型为height balanced histograms (Histogram: HtBal) UncompBkts  等于bucket的数目(其实也等于10053 trace中#Bkts的数目),而EndPtVals 等于已经被压缩的Histogram的大小,其实是等于: select count(*) from dba_tab_histograms where table_name=’YOUR_TABLE_NAME’ and column_name=’YOUR_COLUMN_NAME’的实际总和。  通过这2个值对比,可以了解到popular值的多少以及数据的倾斜度, 是有多个大量重复的值(popular value)还是仅有一个巨大的重复值。

 

 

CBO为什么不走索引?

原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:

 

SQL> create table maclean1 as select * from dba_objects;

Table created.

SQL> update maclean1 set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;                                       

Commit complete.

SQL> create index ind_maclean1 on maclean1(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);

PL/SQL procedure successfully completed.

SQL> explain plan for select * from maclean1 where status='INVALID';

Explained.

SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 11320 |  1028K|    85   (0)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MACLEAN1 | 11320 |  1028K|    85   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='INVALID')

13 rows selected.

10053 trace 

Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
  Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
  Table: MACLEAN1  Alias: MACLEAN1
    Card: Original: 22639.000000  Rounded: 11320  Computed: 11319.50  Non Adjusted: 11319.50
  Access Path: TableScan
    Cost:  85.33  Resp: 85.33  Degree: 0
      Cost_io: 85.00  Cost_cpu: 11935345
      Resp_io: 85.00  Resp_cpu: 11935345
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN1
    resc_io: 185.00  resc_cpu: 8449916
    ix_sel: 0.500000  ix_sel_with_filters: 0.500000
    Cost: 185.24  Resp: 185.24  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 85.33  Degree: 1  Resp: 85.33  Card: 11319.50  Bytes: 0

可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的

 

 

以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。

 

 

实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:

 

[oracle@vrh4 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> show parameter optimizer_fea

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn  & www.askmac.cn

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL>  commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2');

PL/SQL procedure successfully completed.

 

这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。

得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:

 

rem
rem Generate a histogram of data distribution in a column as recorded
rem  in dba_tab_histograms
rem
rem Guy Harrison Jan 2010 : www.guyharrison.net
rem
rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 

set pagesize 10000
set lines 120
set verify off

col char_value format a10 heading "Endpoint|value"
col bucket_count format 99,999,999 heading "bucket|count"
col pct format 999.99 heading "Pct"
col pct_of_max format a62 heading "Pct of|Max value"
rem col endpoint_value format 9999999999999 heading "endpoint|value" 

CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER)
    RETURN VARCHAR2
AS
    l_str      LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x'));
    l_return   VARCHAR2 (4000);
BEGIN
    WHILE (l_str IS NOT NULL)
    LOOP
        l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx'));
        l_str := SUBSTR (l_str, 3);
    END LOOP;

    RETURN (SUBSTR (l_return, 1, 6));
END;
/

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value ,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data;

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT hexstr(endpoint_value) char_value,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data
ORDER BY endpoint_value;

 

使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:

 

 

这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:

 

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SQL>  select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------
Plan hash value: 3087014066

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     9 |   837 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MACLEAN     |     9 |   837 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MACLEAN |     9 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='INVALID')

以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。

我们进一步来看看有直方图情况下的10053 trace:

SQL> alter system flush shared_pool;

System altered.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN[MACLEAN]
  Column (#10):
    NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2

  这里的NewDensity= bucket_count / SUM(bucket_count) /2

   Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
    Histogram: Freq  #Bkts: 2  UncompBkts: 22640  EndPtVals: 2
  Table: MACLEAN  Alias: MACLEAN
 Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
  Access Path: TableScan
    Cost:  85.30  Resp: 85.30  Degree: 0
      Cost_io: 85.00  Cost_cpu: 10804625
      Resp_io: 85.00  Resp_cpu: 10804625
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN
    resc_io: 2.00  resc_cpu: 20763
    ix_sel: 0.000398  ix_sel_with_filters: 0.000398
    Cost: 2.00  Resp: 2.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_MACLEAN
         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 9.00  Bytes: 0

 

上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?

实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>

 

假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:

 

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

使用dbms_stats默认method_opt收集maclean表

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
old  12:    WHERE owner = '&owner'
new  12:    WHERE owner = 'SYS'
Enter value for table: MACLEAN
old  13:      AND table_name = '&table'
new  13:      AND table_name = 'MACLEAN'
Enter value for column: STATUS
old  14:      AND column_name = '&column'
new  14:      AND column_name = 'STATUS'

no rows selected

因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图

    declare
    begin
    for i in 1..500 loop
	execute immediate ' alter system flush shared_pool';
	DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
    execute immediate 'select count(*)  from maclean where status=''INVALID'' ' ;
    end loop;
    end;
    /

PL/SQL procedure successfully completed.

SQL> select obj# from obj$ where name='MACLEAN';

      OBJ#
----------
     97215
SQL> select * from  col_usage$ where  OBJ#=97215;

       OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
     97215          1              1              0                 0           0          0          0 17-OCT-11
     97215         10            499              0                 0           0          0          0 17-OCT-11

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS

Endpoint        bucket         Pct of
value            count     Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI               2     .04
VALIC3           5,453   99.96  *************************************************

11g Multi-Column Correlation Stats and Dynamic Sampling

Oracle CBO优化模式中列的统计信息是一个十分重要的概念,但在11g之前我们所讨论的都是基于单列的统计信息或直方图,也就是说基于成本的优化器总是假设where子句后的谓词中列与列之间不存在联系。但是有的查询包含一个表的多个列,而每个列又都与不同的选择度。这些列中有的是相关的,但优化器并不知道这些关系。在这种情况下,优化器如果要估计出真实的基数(card),必须要了解增加另一列到某个给定列是否会引起结果集的减少。多列上的相关统计数据能提供比单列统计数据或直方图更好的基数估计。当2个列紧密相关时,增加额外的谓词可以减少结果集。Oracle database 11g中引入了扩展统计(也叫多列统计,multicolumn statistics),可以收集一组列上的统计数据,从而让优化器能准确地计算多个单列谓词的选择性。因为把紧密相关的列作为一个组才能正确地放映其组合选择性,所以把相关列作为一组,在其上(列祖)收集统计数据,这些信息足以让优化器能准确地进行选择性估计,在包含使用相关列的谓词查询中,这是我们实际关心的问题。多列统计的引入意味着,在11g中cbo优化器可以对具有多列复杂谓词判断的SQL语句做出更准确的成本估算,许多原本”误用”全表扫描的查询现在可以使用索引扫描的执行计划,语句将运行地更快速。

我们试看下例:
[Read more…]

Histograms: An Overview

NOTE: This article was originally created for version 7.3. Conceptually the ideas presented are
the same as in later versions but implementation details may have changed or have
been enhanced.

Histograms
==========

For uniformly distributed data, the cost-based approach makes fairly accurate
guesses at the cost of executing a particular statement. However, when the
data is not uniformly distributed, the optimizer cannot accurately estimate
the selectivity of a query. Beginning in release 7.3, for columns which do not
have uniform data distribution, Oracle will allow you to store histograms
describing the data distribution of a particular column.

When to Use Histograms
———————-

Histograms are stored in the dictionary and computed by using the DBMS_STATS
command on a particular column. Therefore, there is a maintenance and space
cost for using histograms. You should only compute histograms for columns
which you know have highly-skewed data distribution.

When to Not Use Histograms
————————–

Also, be aware that histograms, as well as all optimizer statistics, are
static. If the data distribution of a column changes frequently, it is
necessary to recompute the histogram for a given column. Histograms are not
useful for columns with the following characteristics:

  o all predicates on the column use bind variables
  o the column data is uniformly distributed
  o the column is not used in WHERE clauses of queries
  o the column is unique and is used only with equality predicates

How to Use Histograms
———————

Create histograms on columns that are frequently used in WHERE clauses of
queries and have a highly-skewed data distribution. You create a histogram
by using the ANALYZE and DBMS_STATS TABLE command for later versions. For example, if you want to create a 10-bucket histogram on the SAL column of the EMP table, issue the following
statement:

    DBMS_STATS.GATHER_TABLE_STATS (NULL,’EMP’, method_opt => ‘FOR COLUMNS sal SIZE 10’);
    ANALYZE TABLE emp COMPUTE STATISTICS FOR COLUMNS sal SIZE 10;

The SIZE keyword states the maximum number of buckets for the histogram.
You would create a histogram on the SAL column if there were an unusual
number of employees with the same salary and few employees with other
salaries.

The ‘FOR’ clause can be used with either COMPUTE STATISTICS or ESTIMATE
STATISTICS.  The following clauses can be used with the ANALYZE TABLE command:

    FOR TABLE  
        collect table statistics for the table
    FOR ALL COLUMNS
        collect column statistics for all columns in the table
    FOR ALL INDEXED COLUMNMS   
        collect column statistics for all indexed columns in the table
    FOR COLUMNS
        collect column statistics for the specified columns
    FOR ALL INDEXES
        all indexes associated with the table will be analyzed
    SIZE
        specifies the maximum number of partitions (buckets) in the
        histogram.
         Default value:    75
         Range of values:  1 – 254

 For DBMS_STATS, syntax is the following:

method_opt

Accepts:

      FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
    *

      FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]…]

size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}

– integer : Number of histogram buckets. Must be in the range [1,254].
– REPEAT : Collects histograms only on the columns that already have histograms.
– AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
– SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.

For 11.2, there is following additional parameter:

– extension : can be either a column group in the format of (column_name, Colume_name [, …]) or an expression

In 10g and 11g, the default is FOR ALL COLUMNS SIZE AUTO. The default value can be changed using the SET_DATABASE_PREFS Procedure, SET_GLOBAL_PREFS Procedure, SET_SCHEMA_PREFS Procedure and SET_TABLE_PREFS Procedure.

Choosing the Number of Buckets for a Histogram
———————————————-

The default number of buckets is 75. 10g onwards, the default value is FOR ALL COLUMNS SIZE AUTO.  This value provides an appropriate level of detail for most data distributions. However, since the number of buckets in the histogram, the sampling rate, and the data distribution all affect
the usefulness of a histogram, you may need to experiment with different
numbers of buckets to obtain the best results.

If the number of frequently occurring distinct values in a column is relatively
small, then it is useful to set the number of buckets to be greater than the
number of frequently occurring distinct values.

Viewing Histograms
——————

You can find information about existing histograms in the database through the
following data dictionary views:

    USER_TAB_HISTOGRAMS, ALL_TAB_HISTOGRAMS, and DBA_TAB_HISTOGRAMS.
    USER_PART_HISTOGRAMS, ALL_PART_HISTOGRAMS, and DBA_PART_HISTOGRAMS.
    USER_SUBPART_HISTOGRAMS, ALL_SUBPART_HISTOGRAMS, and DBA_SUBPART_HISTOGRAMS.

The number of buckets in each column’s histogram is found in these dictionary views :

 o USER_TAB_COL_STATISTICS, ALL_TAB_COL_STATISTICS,DBA_TAB_COL_STATISTICS
   (extracted from USER_TAB_COLUMNS, ALL_TAB_COLUMNS, and DBA_TAB_COLUMNS)
 o USER_PART_COL_STATISTICS,ALL_PART_COL_STATISTICS, DBA_PART_COL_STATISTICS,
 o USER_SUBPART_COL_STATISTICS, ALL_SUBPART_COL_STATISTICS, DBA_SUBPART_COL_STATISTICS

These views have the same definition.

DBA_TAB_HISTOGRAMS

This view lists histograms on columns of all tables.

Column name           Represents This
———————————————————
OWNER                 Owner of table
TABLE_NAME            Table name
COLUMN_NAME           Column name
ENDPOINT_NUMBER       Endpoint number
ENDPOINT_VALUE        Normalized endpoint values for this bucket

DBA_TAB_COLUMNS

This view contains information which describes columns of all tables.
(NOTE: Views and clusters, although included in this view are not relevant
 to histograms.)

Column Name           Represents This
———————————————————-
OWNER                 Owner of table
TABLE_NAME            Table name
COLUMN_NAME           Column name
DATA_TYPE             Datatype of the column
DATA_LENGTH           Length of the column
DATA_PRECISION        Precision for NUMBER or FLOAT datatypes
DATA_SCALE            Digits to right of decimal
NULLABLE              NULL allowable?
COLUMN_ID             Sequence no. of column
DEFAULT_LENGTH        Length of default value
DATA_DEFAULT          Default value
NUM_DISTINCT          Number of distinct values for the column
LOW_VALUE             Smallest value for the column, expressed in hex
                        for the internal representation ofthe first 32
                        bytes of the value
HIGH_VALUE            Highest value for the column, expressed in hex for
                        the internal representation of the first 32 bytes
                        of the value
DENSITY               Density of the column (a measure of how distinct
                        the values are)
NUM_NULLS             The number of columns with null value
NUM_BUCKETS           The number of buckets in the histogram
LAST_ANALYZED         The date that analyze was last run on the table
SAMPLE_SIZE           The amount of data sampled

The column LAST_ANALYZED is useful in determining the last time
statistics, with or without histograms, were computed.  This is
often important to assess the reason for cost-based optimizer’s
choices of execution paths.  All tables involved in a query must be
regularly analyzed as data changes.

沪ICP备14014813号-2

沪公网安备 31010802001379号