Correlated Columns

  • Oracle이 Cardinality를 계산할 때 사용하는 공식들의 기본 전제는 Column이 서로 독립적이라는 것이다. Oracle은 이 가정에 위배되는 경우 잘 해석하지 못한다.

Test


-- create object
DROP TABLE T1 PURGE;

CREATE TABLE T1(C1 VARCHAR2(1), C2 INT);

-- create data
INSERT INTO T1
SELECT 'A', 1
FROM   DUAL
CONNECT BY LEVEL <= 1000
UNION ALL
SELECT 'B', 2
FROM   DUAL
CONNECT BY LEVEL <= 100
UNION ALL
SELECT 'C', 2
FROM   DUAL
CONNECT BY LEVEL <= 100
;

COMMIT;

-- gather stats "with" histogram
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T1', METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY', NO_INVALIDATE => FALSE);

  • 통계정보를 확인한다.

-- Table
SELECT TABLE_NAME,
       NUM_ROWS,
       BLOCKS,
       SAMPLE_SIZE,
       TO_CHAR(LAST_ANALYZED, 'YYYY/MM/DD HH24:MI:SS') AS LAST_ANAL
FROM   USER_TAB_STATISTICS
WHERE  TABLE_NAME = UPPER('T1')
;

TABLE_NAME	NUM_ROWS	BLOCKS	SAMPLE_SIZE
T1	        1,200	42	1,200



-- Column
SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;

TABLE_NAME	COLUMN_NAME	NUM_DISTINCT	NUM_NULLS	DENSITY	               LOW_VALUE 	HIGH_VALUE	HISTOGRAM

T1	       C1	        3	        0	        0.000416666666666667    41	43	FREQUENCY
T1	       C2	        2	        0	        0.000416666666666667    C102	C103	FREQUENCY



-- Histogram
SELECT TABLE_NAME,
       COLUMN_NAME,
       ENDPOINT_NUMBER,
       ENDPOINT_VALUE || '(' || ENDPOINT_ACTUAL_VALUE || ')' AS ENDPOINT_VALUE
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = UPPER('T1')
ORDER  BY COLUMN_NAME,
          ENDPOINT_NUMBER
;

TABLE_NAME	COLUMN_NAME	ENDPOINT_NUMBER	ENDPOINT_VALUE

T1	        C1	        1000	337499295804764000000000000000000000()
T1	        C1	        1100	342691592663299000000000000000000000()
T1              C1	        1200	347883889521833000000000000000000000()
T1	        C2	        1000	1()
T1	        C2	        1200	2()


  • Histogram이 있기 때문에 정확한 Cardinality를 계산한다.

SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1
WHERE  C1 = 'A'
;

select * from table(dbms_xplan.display_cursor(null, null, 'iostats cost last'));

PLAN_TABLE_OUTPUT

Plan hash value: 3693069535
 
--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:00.01 |      37 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   1000 |    11   (0)|   1000 |00:00:00.01 |      37 |
--------------------------------------------------------------------------------------------------
 
   2 - filter("C1"='A')
 


*
= Base Cardinality * Selectivity
= Base Cardinality * Selectivity(C1 = 'A' AND C2 = 1)
= Base Cardinality * Selectivity(C1 = 'A') * Selectivity(C2 = 1)
= 1200 * 1000/1200 * 1000/1200
= 833


SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1
WHERE  C1 = 'A'
AND    C2 = 1
;

select * from table(dbms_xplan.display_cursor(null, null, 'iostats cost last'));

PLAN_TABLE_OUTPUT

Plan hash value: 3693069535
 
--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:00.01 |      37 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    833 |    11   (0)|   1000 |00:00:00.01 |      37 |
--------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(("C1"='A' AND "C2"=1))
 


*
= Base Cardinality * Selectivity
= Base Cardinality * Selectivity(C1 = 'A' AND C2 = 2)
= Base Cardinality * Selectivity(C1 = 'A') * Selectivity(C2 = 2)
= 1200 * 1000/1200 * 200/1200
= 166


SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1
WHERE  C1 = 'A'
AND    C2 = 2
;

select * from table(dbms_xplan.display_cursor(null, null, 'iostats cost last'));

PLAN_TABLE_OUTPUT

 
Plan hash value: 3693069535
 
--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:00.01 |      37 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    166 |    11   (0)|      0 |00:00:00.01 |      37 |
--------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(("C2"=2 AND "C1"='A'))

Exetended Statistics

  • 10g까지는 Column들간에 정확한 상호 의존성을 알 수가 없다.
  • 11g부터는 Extended Statistics를 이용해 Column들간의 상호 의존성을 통계 정보에 저장 가능하다.
    • EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T1', METHOD_OPT => 'FOR COLUMNS (C1, C2) SIZE SKEWONLY', NO_INVALIDATE => FALSE);