안녕하세요
새해 복 많이들 받으세요.
다음과 같은 쿼리를 실행하는데 테이블에는 약4억건 정도 들어 있습니다.
SELECT
AVG([oneday_use_cnt])
,AVG([oneday_avg_price])
FROM
(
SELECT
[usedate]
,COUNT(DISTINCT([idi])) AS [oneday_use_cnt]
,SUM([use_price]) / COUNT(DISTINCT([idi])) AS [oneday_avg_price]
FROM
(
SELECT
A.*
FROM [s_train_tbl] A INNER JOIN [station_mst] B
ON A.[station_1] = B.[station_name]
WHERE
A.[usedate] BETWEEN '2017-12-01' AND '2018-01-10'
AND
A.[process_name] = 'OUT'
AND
A.[process_flg] <> '06'
AND
A.[use_price] <> 0
UNION
SELECT
A.*
FROM [s_train_tbl] A INNER JOIN [station_mst] B
ON A.[station_2] = B.[station_name]
WHERE
A.[usedate] BETWEEN '2017-12-01' AND '2018-01-10'
AND
A.[process_name] = 'OUT'
AND
A.[process_flg] <> '06'
AND
A.[use_price] <> 0
) C
GROUP BY
[usedate]
) D
상기 SQL을 SQL Server2012에서 실행했을 때 18분정도 걸리네요.
s_train_tbl에는 usedate에 인덱스가 생성되어 있는 상태입니다.
성능개선의 방향이 있는지요?
마농님 아찌님 대단히 감사합니다.
B와의 조인이 필요합니다. B.[station_name] IN (A.[station_1],A.[station_2])과 같이 where에 설정하면
레코드가 중복되어 되돌아 오기에 SUM([use_price])가 영향을 받습니다.
SELECT
A.*
FROM [s_train_tbl] A, [station_mst] B
WHERE
A.[usedate] BETWEEN '2017-12-01' AND '2018-01-10'
AND
A.[process_name] = 'OUT'
AND
A.[process_flg] <> '06'
AND
A.[use_price] <> 0
AND
B.[station_name] IN (A.[station_1], A.[station_2])
따라서 join후에 union으로 해당 레코드의 중복제거를 하고 있습니다.
왜냐하면 station_1와 station_2에는 B.[station_name] 에 동시에 존재하는 레코드가 있을수 있기에요
-- 조인이 필요 없다면? SELECT AVG(oneday_use_cnt ) avg_oneday_use_cnt , AVG(oneday_avg_price) avg_oneday_avg_price FROM (SELECT a.usedate , COUNT(DISTINCT a.idi) AS oneday_use_cnt , SUM(a.use_price) / COUNT(DISTINCT a.idi) AS oneday_avg_price FROM s_train_tbl a WHERE a.usedate BETWEEN '2017-12-01' AND '2018-01-10' AND a.process_name = 'OUT' AND a.process_flg <> '06' AND a.use_price <> 0 AND ( a.station_1 IS NOT NULL OR a.station_2 IS NOT NULL ) GROUP BY a.usedate ) d ; -- 조인이 필요 하다면? -- 1. Join & Distinct -- SELECT AVG(oneday_use_cnt ) avg_oneday_use_cnt , AVG(oneday_avg_price) avg_oneday_avg_price FROM (SELECT a.usedate , COUNT(DISTINCT a.idi) AS oneday_use_cnt , SUM(a.use_price) / COUNT(DISTINCT a.idi) AS oneday_avg_price FROM (SELECT DISTINCT a.pk , a.usedate , a.idi , a.use_price FROM s_train_tbl a INNER JOIN station_mst b ON b.station_name IN (a.station_1, a.station_2) WHERE a.usedate BETWEEN '2017-12-01' AND '2018-01-10' AND a.process_name = 'OUT' AND a.process_flg <> '06' AND a.use_price <> 0 ) a GROUP BY a.usedate ) d ; -- 2. Outer Join & Is Not Null -- SELECT AVG(oneday_use_cnt ) avg_oneday_use_cnt , AVG(oneday_avg_price) avg_oneday_avg_price FROM (SELECT a.usedate , COUNT(DISTINCT a.idi) AS oneday_use_cnt , SUM(a.use_price) / COUNT(DISTINCT a.idi) AS oneday_avg_price FROM s_train_tbl a LEFT OUTER JOIN station_mst b ON a.station_1 = b.station_name LEFT OUTER JOIN station_mst c ON a.station_2 = c.station_name WHERE a.usedate BETWEEN '2017-12-01' AND '2018-01-10' AND a.process_name = 'OUT' AND a.process_flg <> '06' AND a.use_price <> 0 AND ( b.station_name IS NOT NULL OR c.station_name IS NOT NULL ) GROUP BY a.usedate ) d ; -- 3. Exists -- SELECT AVG(oneday_use_cnt ) avg_oneday_use_cnt , AVG(oneday_avg_price) avg_oneday_avg_price FROM (SELECT a.usedate , COUNT(DISTINCT a.idi) AS oneday_use_cnt , SUM(a.use_price) / COUNT(DISTINCT a.idi) AS oneday_avg_price FROM s_train_tbl a WHERE a.usedate BETWEEN '2017-12-01' AND '2018-01-10' AND a.process_name = 'OUT' AND a.process_flg <> '06' AND a.use_price <> 0 AND EXISTS (SELECT 0 FROM station_mst b WHERE b.station_name IN (a.station_1, a.station_2) ) GROUP BY a.usedate ) d ;