I. 통계정보 조작

I.1. 개념 설명

  • 통계정보를 조작하여 Cardinality를 제어하는 기법
  • Cardinality는 Selectivity에 의해 결정되고, Selectivity는 주로 Density에 의해 결정됨
  • 따라서, Density를 조작할 수 있으면 Cardinality를 조작할 수 있으며 Histogram 또한 마찬가지임

I.2. 테스트 Object 생성


DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1(ID1 INT, STATUS1 CHAR(5));
CREATE TABLE T2(ID2 INT);

CREATE INDEX T1_IDX ON T1(ID1);
CREATE INDEX T2_IDX ON T2(ID2);

INSERT INTO T1
SELECT -- column id1 is uniform
       MOD(R, 1000),
       -- column status1 is skewed!!!
       CASE WHEN R BETWEEN 1 AND 9000 THEN '-1'
            ELSE TO_CHAR(MOD(R, 1000)) END
FROM   (SELECT LEVEL AS R
        FROM   DUAL
        CONNECT BY LEVEL <= 10000)
ORDER BY DBMS_RANDOM.RANDOM
;

INSERT INTO T2
SELECT MOD(R, 100)
FROM   (SELECT LEVEL AS R
        FROM   DUAL
        CONNECT BY LEVEL <= 10000)
;

BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T1',
                                  CASCADE          => TRUE,
                                  ESTIMATE_PERCENT => 100,
                                  METHOD_OPT       => 'FOR ALL COLUMNS SIZE 1',
                                  NO_INVALIDATE    => FALSE
                                  );
END;
/

BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T2',
                                  CASCADE          => TRUE,
                                  ESTIMATE_PERCENT => 100,
                                  METHOD_OPT       => 'FOR ALL COLUMNS SIZE 1',
                                  NO_INVALIDATE    => FALSE
                                  );
END;
/

I.3. 통계정보 확인


-- T1 테이블의 컬럼 통계정보
SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;

TABLE_NAMECOLUMN_NAMENUM_DISTINCTNUM_NULLSDENSITYLOW_VALUEHIGH_VALUEHISTOGRAM
T1STATUS110010.0009990012D312020203939392020NONE
T1ID110010.00180C20A64NONE

I.4. 테스트


SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)          
----------          
     89100          

1 row selected.
Elapsed: 00:00:00.12

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
              
-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.10 |    1221 |
|   2 |   NESTED LOOPS      |        |      1 |    101 |    10   (0)|  89100 |00:00:00.18 |    1221 |
|*  3 |    TABLE ACCESS FULL| T1     |      1 |      1 |     9   (0)|    900 |00:00:00.01 |      31 |
|*  4 |    INDEX RANGE SCAN | T2_IDX |    900 |    100 |     1   (0)|  89100 |00:00:00.10 |    1190 |
-----------------------------------------------------------------------------------------------------
                                                                                                     
Predicate Information (identified by operation id):                                                  
---------------------------------------------------                                                  
                                                                                                     
   3 - filter(("T1"."ID1"<=100 AND "T1"."STATUS1"='-1' AND "T1"."ID1">=1))                           
   4 - access("T1"."ID1"="T2"."ID2")                                                                 
       filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                   

  • 'T1' 테이블에 -1 값이 9000로우가 있음에도 불구하고 통계정보에서는 이를 모르기 때문에 E-Rows를 1로 계산을 했다.
    만약 정확히 9000로우를 알고 있었다면 Hash Join을 사용했을 것이다.
    이를 해결하기 위해서는 히스토그램 생성과 통계정보 조작 2가지 방법이 있다.

I.5. 히스토그램 생성

1) 히스토그램 생성 전


SELECT TABLE_NAME,
       COLUMN_NAME,
       ENDPOINT_NUMBER,
       ENDPOINT_VALUE || '(' || ENDPOINT_ACTUAL_VALUE || ')' AS ENDPOINT_VALUE
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = UPPER('T1')
ORDER  BY COLUMN_NAME,
          ENDPOINT_NUMBER
;

TABLE_NAMECOLUMN_NAMEENDPOINT_NUMBERENDPOINT_VALUE
T1ID100()
T1ID11999()
T1STATUS10234649741948204000000000000000000000()
T1STATUS11297121544231514000000000000000000000()

2) 히스토그램 생성


BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T1',
                                  ESTIMATE_PERCENT => 100,
                                  CASCADE          => TRUE,
                                  METHOD_OPT       => 'FOR COLUMNS STATUS1 SIZE SKEWONLY',
                                  NO_INVALIDATE    => FALSE);
END;
/

3) 히스토그램 생성 후


SELECT TABLE_NAME,
       COLUMN_NAME,
       ENDPOINT_NUMBER,
       ENDPOINT_VALUE || '(' || ENDPOINT_ACTUAL_VALUE || ')' AS ENDPOINT_VALUE
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = UPPER('T1')
ORDER  BY COLUMN_NAME,
          ENDPOINT_NUMBER
;

TABLE_NAMECOLUMN_NAMEENDPOINT_NUMBERENDPOINT_VALUE
T1ID100()
T1ID11999()
T1STATUS1228234649741948204000000000000000000000()
T1STATUS1229255400627676803000000000000000000000()
T1STATUS1230255481361174405000000000000000000000()
T1STATUS1231255562173900169000000000000000000000()
T1STATUS1232260612890032291000000000000000000000()
T1STATUS1233260693702758056000000000000000000000()
T1STATUS1234260755025355842000000000000000000000()
T1STATUS1235265825231615942000000000000000000000()
T1STATUS1236265886554213728000000000000000000000()
T1STATUS1237265967366939493000000000000000000000()
T1STATUS1238271018083071615000000000000000000000()
T1STATUS1239271098895797379000000000000000000000()
T1STATUS1240276167834406879000000000000000000000()
T1STATUS1241276230424655266000000000000000000000()
T1STATUS1242276311158152868000000000000000000000()
T1STATUS1243281361953513152000000000000000000000()
T1STATUS1244281442687010754000000000000000000000()
T1STATUS1245281504088836703000000000000000000000()
T1STATUS1246286574215868640000000000000000000000()
T1STATUS1247286653840171967000000000000000000000()
T1STATUS1248286716351192191000000000000000000000()
T1STATUS1249291786557452291000000000000000000000()
T1STATUS1250291847880050077000000000000000000000()
T1STATUS1251291928692775842000000000000000000000()
T1STATUS1252296979408907964000000000000000000000()
T1STATUS1253297060221633728000000000000000000000()
T1STATUS1254297121544231514000000000000000000000()

4) 플랜 다시 확인


SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)           
----------           
     89100           

1 row selected.
Elapsed: 00:00:00.07

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
          
--------------------------------------------------------------------------------------------------- 
| Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | 
--------------------------------------------------------------------------------------------------- 
|   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.04 |      62 | 
|*  2 |   HASH JOIN         |      |      1 |  14779 |    19   (6)|  89100 |00:00:00.01 |      62 | 
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    908 |     9   (0)|    900 |00:00:00.01 |      31 | 
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9999 |     9   (0)|   9900 |00:00:00.01 |      31 | 
--------------------------------------------------------------------------------------------------- 
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   2 - access("T1"."ID1"="T2"."ID2")                                                                
   3 - filter(("T1"."ID1"<=100 AND "T1"."STATUS1"='-1' AND "T1"."ID1">=1))                          
   4 - filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                  

I.6. 통계정보 변경

1) Density 조작 전


SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;

TABLE_NAMECOLUMN_NAMENUM_DISTINCTNUM_NULLSDENSITYLOW_VALUEHIGH_VALUEHISTOGRAM
T1STATUS110010.0009990012D312020203939392020NONE
T1ID110010.00180C20A64NONE

2) Density 조작


BEGIN
    DBMS_STATS.SET_COLUMN_STATS(OWNNAME => USER,                       
                                TABNAME => 'T1',                         
                                COLNAME => 'STATUS1',                      
                                DENSITY => 1
                                );
END;
/

3) Density 조작 후


SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;

TABLE_NAMECOLUMN_NAMENUM_DISTINCTNUM_NULLSDENSITYLOW_VALUEHIGH_VALUEHISTOGRAM
T1STATUS11001012D312020203939392020NONE
T1ID110010.00180C20A64NONE

4) 플랜 다시 확인


SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)           
----------           
     89100           

1 row selected.
Elapsed: 00:00:00.07

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
          
---------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.04 |      62 |
|*  2 |   HASH JOIN         |      |      1 |  15410 |    19   (6)|  89100 |00:00:00.01 |      62 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   1011 |     9   (0)|    900 |00:00:00.01 |      31 |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9999 |     9   (0)|   9900 |00:00:00.01 |      31 |
---------------------------------------------------------------------------------------------------
                                                                                                   
Predicate Information (identified by operation id):                                                
---------------------------------------------------                                                
                                                                                                   
   2 - access("T1"."ID1"="T2"."ID2")                                                               
   3 - filter(("T1"."ID1"<=100 AND "T1"."ID1">=1 AND "T1"."STATUS1"='-1'))                         
   4 - filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                 
;

I.7. 통계정보 조작 시 주의할 점

  • 변화가 전역적이다. 통계 정보를 참조하는 Query가 여러 개 존재할 수 있으며,
    통계정보의 변경은 이들 Query 모두에 대해 영향을 미칠 수 있다.
  • 통계 정보에 대한 지식이 필수적이다.
  • Oracle 10g 부터는 통계 정보를 변경해도 그 결과가 Query에 즉시 반영되지 않는다.
    DBMS_STATS.GATHER_XXX_STTS Procedure의 NO_INVALIDATE Parameter 때문이다.
    Oracle 9i까지는 FALSE, 즉 통계 정보 변경시 Query가 모두 Invalidate 된다.
    하지만 Oracle 10g 부터는 AUTO로 변경되었다. 이 경우 Oracle은 통계 정보 변경 후
    5시간(_OPTIMIZER_INVALIDATION_PERIOD Parameter)에 걸쳐 서서히 Query들이 Invalidation 되게끔 제어한다.
    만일 즉시 반영을 원한다면 NO_INVALIDATE Parameter의 값을 FALSE로 변경해야 한다.
    (급격한 hard parsing의 가능성)