꿈꾸는 개발자, DBA 커뮤니티 구루비

4.6 정규형과 성능

정규화를 하면 중복 데이터가 최소화되고 인스턴스의 크기는 작아짐 이는 한 블록에 저장하는 인스턴스를 많아지게 함
한 블록에 많은 인스턴스가 존재하면 한 번에 메모리에 올라온 블록이 다시 사용될 확률이 높아짐
하지만 이렇게 정규화 하는 것이 무조건 좋은 성능을 보장하는 것은 아님(조인 횟수의 증가로 인한 성능 저하 등등)
화면 구성에 따라 비정규화하는 것이 성능을 높여줄 수도 있음

예시

한 고객에 대해 1월부터 12월까지 집계를 한 줄에 횡으로 전부 보여주어야할 경우
- 왼쪽 모델이 성능에 유리
한 고객에 대해 종으로 떨어뜨려서 보여주어야할 경우
- 오른쪽 모델이 성능에 유리

되도록 모델링 단계에서 성능 관련 문제를 도출해 공론화 하고 이에 대한 해결책을 만들어야 재작업이 줄어들어 비용을 최소화할 수 있음
정규화를 충실하게 수행한 데이터 모델은 성능을 개선하기도 수월함

정규화 성능 테스트


--테스트 테이블 생성
create table testtb1
(
acol varchar2(10),
bcol varchar2(10),
ccol varchar2(4000)
)

create table testtb2
(
acol varchar2(10),
bcol varchar2(10)
)

create table testtb3
(
acol varchar2(10),
ccol varchar2(4000)
)

-------------------------------------------------------
--case 1 : 사이즈가 큰 컬럼에 간혹 데이터가 존재할 때--
-------------------------------------------------------

--테스트 데이터 삽입
insert into testtb1
 select DBMS_RANDOM.STRING('X', 10) acol
      , DBMS_RANDOM.STRING('X', 10) bcol
      , case when mod(rownum,100) = 0 then DBMS_RANDOM.STRING('X', 4000) end ccol
   from dual
connect by level <= 50000

insert into testtb2
select acol, bcol 
  from testtb1

insert into testtb3
select acol, ccol 
  from testtb1
 where ccol is not null

--테스트 테이블 analyze
analyze table testtb1 compute statistics

analyze table testtb2 compute statistics

analyze table testtb3 compute statistics

--테스트 테이블 block 수 측정
select blocks from all_tables where table_name = 'testtb1'
--> 2727

select blocks from all_tables where table_name = 'testtb2'
--> 187

select blocks from all_tables where table_name = 'testtb3'
--> 503

---------------------------------------------------------
--case 1 : 사이즈가 큰 컬럼에 대부분 데이터가 존재할 때--
---------------------------------------------------------

--테이블 truncate
truncate table testtb1

truncate table testtb2

truncate table testtb3

--테스트 데이터 삽입
insert into testtb1
 select DBMS_RANDOM.STRING('X', 10) acol
      , DBMS_RANDOM.STRING('X', 10) bcol
      , (select DBMS_RANDOM.STRING('X', 4000) from dual) ccol
   from dual
connect by level <= 50000

insert into testtb2
select acol, bcol 
  from testtb1

insert into testtb3
select acol, ccol 
  from testtb1
  
--테스트 테이블 analyze 
analyze table testtb1 compute statistics

analyze table testtb2 compute statistics

analyze table testtb3 compute statistics  
 
--테스트 테이블 block 수 측정
select blocks from all_tables where table_name = 'testtb1'
-->50001

select blocks from all_tables where table_name = 'testtb2'
-->187

select blocks from all_tables where table_name = 'testtb3'
-->50001

-------------------------------------------------------
------------------------ 정  리 -----------------------
-------------------------------------------------------

case 1의 경우(간혹 ccol에 데이터가 입력되는 경우)

비정규화된 테이블 testtb1 의 블록 사용수는 2727이지만

정규화된 테이블 testtb2,3의 블록 사용수는 690인 것을 알 수 있음

똑같은 데이터이더라도 정규화했을 때 block 사용량이 줄어들고

따라서 성능에 어느정도 도움이 될 것 같음

case 2의 경우(대부분 ccol에 데이터가 입력되는 경우)

비정규화된 테이블 testtb1 의 블록 사용수는 50001

testtb2, 3 의 블록 사용수는 50188인 것을 알 수 있음

정규화하더라도 block 사용량이 줄어드는 것이 아니라

오히려 늘어났음

-------------------------------------------------------
------------------------ 결  론 -----------------------
------------------------------------------------------- 

데이터 입력 패턴에 따라 정규화하는 것이 성능에 유리할 수도 있지만

비정규화하는 것이 성능에 유리할 수도 있음