05 카디널러티

(1) 선택도

  • 선택도 (Selectivity) : 전체 대상 레코드 중 특정 조건에 의해 선택될 것으로 예상되는 레코드 비율
  • 선택도 -> 카디널러티 -> 비용 -> 엑세스 방식, 조인순서, 조인 방법 등 결정
  • 히스토그램이 없거나, 조건절에 바인드 변수 사용 시 옵티마이저는 데이터 분포가 균일하다고 가정한 상태에서 선택도 구함
  • 히스토그램이 없을 때 등치(=) 조건에 대한 선택도 = 1/Distinct Value 개수 = 1/num_distinct
  • 히스토그램이 없을때 범위검색(ex. between) 조건에 대한 선택도 = 조건절에서 요청한 값 범위/전체 값 범위

* num_rows : 1,000,000
* num_distinct : 100
* low_value : 1
* high_value : 1,000

 - no > 500 : (1,000-500) / (1,000-1) = 0.5
 - no >= 500 : (1,000-500) / (1,000-1) + 1/100 = 0.51
 - no bwtwwen 500 and 1000 : (1,000-500) / (1,000-1) + 1/100 + 1/100 = 0.52

* no > 500
선택도 = 1,000 - 500 ( high_value - 비교값 )/ 1,000 - 1 ( high_value - low_value ) = 0.5             

* no < 500
선택도 = 500 - 1 ( 비교값 - 1 )/1,000 - 1 ( high_value - low_value ) = 0.499              

  • 컬럼 히스토그램이 없을 떄 옵티마이저는 '조건절에서 요청한 값 범위'에 속한 값들이 '전체 값 범위'에 고르게 분포돼 있음을 가정하고 선택도 구함

(2) 카디널러티

  • 카디널러티 :특정 액세스 단계를 거치고 나서 출력될 것으로 예상되는 결과 건수
  • 카디널리티 = 총 로우수 * 선택도 = num_rows/num_distinct
  • Distinct Value = 10 이면 선택도 = 0.1이고, 총 로우 수가 1,000이라면 카디널리티 100이 됨.
  • 선택도 및 카디널리티 계산식 테스트

SQL> CREATE TABLE T_EMP AS
  2  SELECT B.NO, A.*
  3    FROM (SELECT * FROM SCOTT.EMP WHERE ROWNUM <= 10 ) A
  4       , (SELECT ROWNUM NO FROM DUAL CONNECT BY LEVEL <= 100) B;

테이블이 생성되었습니다.

SQL> begin
  2     dbms_stats.gather_table_stats( user, 'T_EMP', method_opt=> 'for all columns size 1' );
  3  end;
  4  /

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> select job, count(*) from t_emp group by job order by job;

JOB         COUNT(*)
--------- ----------
ANALYST          100
CLERK            100
MANAGER          300
PRESIDENT        100
SALESMAN         400
SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = 'SALESMAN';

  COUNT(*)
----------
       400

SQL> select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST'));

--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |       8 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |    200 |    400 |00:00:00.01 |       8 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"='SALESMAN')

SQL> SELECT T.NUM_ROWS, C.NUM_NULLS, C.NUM_DISTINCT
  2       , 1/C.NUM_DISTINCT SELECTIVITY
  3       , NUM_ROWS/C.NUM_DISTINCT AS CRADINALITY
  4    FROM USER_TABLES T, USER_TAB_COLUMNS C
  5  WHERE T.TABLE_NAME = 'T_EMP'
  6     AND C.TABLE_NAME = T.TABLE_NAME
  7     AND C.COLUMN_NAME = 'JOB'
  8  ;

  NUM_ROWS NUM_NULLS NUM_DISTINCT SELECTIVITY CRADINALITY
---------- --------- ------------ ----------- -----------
      1000         0            5          .2         200


* 히스토그램이 있는 경우

SQL> begin
  2     dbms_stats.gather_table_stats( user, 'T_EMP', method_opt=> 'for all columns size 5' ); --도수 분포 히스토그램
  3  end;
  4  /

SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = 'CLERK';

  COUNT(*)
----------
       100
SQL> @XPLAN

--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |       8 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |    100 |    100 |00:00:00.01 |       8 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"='CLERK')
SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = 'SALESMAN';

  COUNT(*)
----------
       400

SQL> @XPLAN

--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |       8 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |    400 |    400 |00:00:00.01 |       8 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"='SALESMAN')

 -- 히스토그램 바인드
 SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = :JOB;

  COUNT(*)
----------
         0

SQL> @XPLAN

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |    200 |      0 |00:00:00.01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"=:JOB)

(3) NULL 값을 포함할 때


SQL> update t_emp set job = null where no <= 50;

500 행이 갱신되었습니다.

SQL> commit;

커밋이 완료되었습니다.

SQL> select count(*) from t_emp where job is null;

  COUNT(*)
----------
       500

SQL> begin
  2     dbms_stats.gather_table_stats( user, 'T_EMP', method_opt=> 'for all columns size 1' );  --히스토그램 제거
  3  end;
  4  /

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> COLUMN NUM_NULLS FORMAT 9999999
SQL>
SQL> SELECT NUM_NULLS, NUM_DISTINCT
  2    FROM USER_TAB_COLUMNS
  3   WHERE TABLE_NAME = 'T_EMP'
  4     AND COLUMN_NAME = 'JOB';

NUM_NULLS NUM_DISTINCT
--------- ------------
      500            5


SQL>

  • Null 값을 고려해 조정된 선택도 공식

		 1                       NULL 값을 제외한 로우 수
선택도 = --------------------  * -------------------- 
          Distinct Value 개수         총 로우 수
	 
                 1                 (NUM_ROWS - NUM_NULLS)
       = -------------------- *  ------------------------
            NUM_DISTINCT                  NUM_ROWS
  
          1 - ( NUM_NULLS / NUM_ROWS )
       = ----------------------------
                 NUM_DISTINCT

              1 - ( 500 / 1000 )
       = ----------------------------  = 0.1
                     5
		     

카디널리티 = 1000 * 0.1 = 100

SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = :JOB;

  COUNT(*)
----------
         0

SQL> @XPLAN

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |    100 |      0 |00:00:00.01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("JOB"=:JOB)
{CODE}


h3. (4) 조건절이 두개 이상일 때
- 카디널리티 : 각 컬럼의 선택도 * 전체 로우수
{code:sql}

SQL> SELECT T.NUM_ROWS, C.NUM_NULLS, C.NUM_DISTINCT
  2       , ( 1-C.NUM_NULLS/T.NUM_ROWS)/C.NUM_DISTINCT SELECTIVITY
  3    FROM USER_TABLES T, USER_TAB_COLUMNS C
  4  WHERE T.TABLE_NAME = 'T_EMP'
  5     AND C.TABLE_NAME = T.TABLE_NAME
  6     AND C.COLUMN_NAME IN(  'JOB', 'DEPTNO' )
  7  ;

  NUM_ROWS NUM_NULLS NUM_DISTINCT SELECTIVITY
---------- --------- ------------ -----------
      1000       500            5          .1
      1000         0            3  .333333333

SQL> SELECT COUNT(*) FROM T_EMP WHERE JOB = :JOB AND DEPTNO = :DEPTNO;

  COUNT(*)
----------
         0

SQL> @XPLAN

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |      1 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL| T_EMP |      1 |     33 |      0 |00:00:00.01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("JOB"=:JOB AND "DEPTNO"=:DEPTNO))



(5) 범위 검색 조건일 때


선택도 = 조건절에서 요청한 값 범위/전체 값 범위    

SQL> CREATE TABLE T
  2  AS
  3  SELECT ROWNUM NO1
  4       , CASE WHEN ROWNUM <= 1000 OR ROWNUM > 9000 THEN ROWNUM ELSE 5000 END NO2
  5   FROM DUAL
  6  CONNECT BY LEVEL <= 10000;

테이블이 생성되었습니다.

SQL> begin
  2     dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all columns size 1' );  --히스토그램 제거
  3  end;
  4  /

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> SELECT MAX( NO1 ) AS MAX_1, MIN( NO1 ) AS MIN_1
  2        , MAX( NO2 ) AS MAX_2, MIN( NO2 ) AS MIN_2
  3   FROM T;

     MAX_1      MIN_1      MAX_2      MIN_2
---------- ---------- ---------- ----------
     10000          1      10000          1

-- NO1 : 1 ~ 10000
-- NO2 : 1~1000, 9001 ~ 10000

SQL> SELECT COUNT(*) FROM T WHERE NO1 > 3000;

  COUNT(*)
----------
      7000

SQL> @XPLAN

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   7001 |   7000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("NO1">3000)


19 개의 행이 선택되었습니다.

SQL> SELECT COUNT(*) FROM T WHERE NO2 > 3000;

  COUNT(*)
----------
      9000

SQL> @XPLAN

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   7001 |   9000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("NO2">3000)

SQL> begin
  2     dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all columns size 254' );  --히스토그램 생성
  3  end;
  4  /

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> SELECT COUNT(*) FROM T WHERE NO1 > 3000;

  COUNT(*)
----------
      7000

SQL> @XPLAN

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   7001 |   7000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------

SQL> SELECT COUNT(*) FROM T WHERE NO2 > 3000;

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   7001 |   9000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------


SQL> SELECT COUNT(*) FROM T WHERE NO1 > 5000;

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   5047 |   5000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------

SQL> SELECT COUNT(*) FROM T WHERE NO2 > 5000;

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      19 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   1004 |   1000 |00:00:00.01 |      19 |
-------------------------------------------------------------------------------------



{CODE}

h3. (6)cardinality 힌트를 이용한 실행계획 제어
{code:sql}

SQL> SELECT /*+ USE_HASH( D E ) */ * FROM SCOTT.DEPT D, SCOTT.EMP E
  2   WHERE D.DEPTNO = E.DEPTNO
  3  ;
...

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     14 |00:00:00.01 |      14 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |     14 |     14 |00:00:00.01 |      14 |  1000K|  1000K|  747K (0)|
|   2 |   TABLE ACCESS FULL| DEPT |      1 |      4 |      4 |00:00:00.01 |       7 |       |       |          |
|   3 |   TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPTNO"="E"."DEPTNO")

SQL> SELECT /*+ USE_HASH( D E ) CARDINALITY(D 16) */ * FROM SCOTT.DEPT D, SCOTT.EMP E
  2   WHERE D.DEPTNO = E.DEPTNO
  3   ;

...

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     14 |00:00:00.01 |      14 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |     56 |     14 |00:00:00.01 |      14 |   865K|   865K|  664K (0)|
|   2 |   TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   3 |   TABLE ACCESS FULL| DEPT |      1 |     16 |      4 |00:00:00.01 |       8 |       |       |          |
----------------------------------------------------------------------------------------------------------------

SQL> SELECT /*+ USE_HASH( D E ) OPT_ESTIMATE( TABLE, D, SCALE_ROWS=4 )  */ * FROM SCOTT.DEPT D, SCOTT.EMP E
  2   WHERE D.DEPTNO = E.DEPTNO  ;

...

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     14 |00:00:00.01 |      14 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   5726 |     14 |00:00:00.01 |      14 |   865K|   865K|  632K (0)|
|   2 |   TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   3 |   TABLE ACCESS FULL| DEPT |      1 |   1636 |      4 |00:00:00.01 |       8 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPTNO"="E"."DEPTNO")