어제 LAG 함수를 pajama 님께서 알려주셔서 유용하게 사용하고 있습니다.
그런데, 여기서 질문이 더 있는데요, 아래과 같이 데이터가 있다고 가정합니다.
cust_key는 C00000001 뿐만아니라 C00000002, C00000003 ,,,, 쭉 있을 것이고
seq는 'reserv_date' 를 기준으로 오름차순으로 정렬했습니다.
seq | reserv_date | visit_date | cancel_date | cust_key |
1 | 2010-08-01 | 2010-08-07 | null | C00000001 |
2 | 2010-09-11 | 2010-09-13 | null | C00000001 |
3 | 2016-08-14 | null | 2016-09-01 | C00000001 |
4 | 2017-08-16 | null | 2017-08-18 | C00000001 |
5 | 2020-11-10 | 2020-11-15 | null | C00000001 |
6 | 2020-11-11 | 2020-11-15 | null | C00000001 |
7 | 2021-10-01 | 2021-10-04 | null | C00000001 |
1 | 2015-11-05 | 2015-11-06 | null | C00000002 |
2 | 2016-11-01 | 2016-11-30 | null | C00000002 |
이렇게 visit_date가 다 들어있는 경우에는
lag를 사용하여, 직전에 방문한날짜(visit_date)를 찾으면 아래와 같이 결과가 나옵니다.
lag(visit_date, 1, null) over(partition by cust_key order by visit_date)
seq | reserv_date | visit_date | cancel_date | visit_date_prv | cust_key |
1 | 2010-08-01 | 2010-08-07 | null | null | C00000001 |
2 | 2010-09-11 | 2010-09-13 | null | 2010-08-07 | C00000001 |
3 | 2016-08-14 | null | 2016-09-01 | null | C00000001 |
4 | 2017-08-16 | null | 2017-08-18 | null | C00000001 |
5 | 2020-11-10 | 2020-11-15 | null | 2010-09-13 | C00000001 |
6 | 2020-11-11 | 2020-11-15 | null | 2020-11-15 | C00000001 |
7 | 2021-10-01 | 2021-10-04 | null | 2020-11-15 | C00000001 |
1 | 2015-11-05 | 2015-11-06 | null | null | C00000002 |
2 | 2016-11-01 | 2016-1130 | null | 2015-11-06 | C00000002 |
이렇게 나옵니다.
3번째에 visit_date_prv 컬럼에 null 값을 '2010-09-13' 으로 채우려고 하는데,,, 어떻게 할 수 있을까요??
visit_date_prv 컬럼의 3번쨰, 4번째 행이 null로 나오게 됩니다.
원하는 결과값은 아래와 같이 나와야하는데, 우주민님께서 알려주신 쿼리로는 null이 여러개인경우 하나만 null 값이 변경이 됩니다.
seq | reserv_date | visit_date | cancel_date | visit_date_prv | cust_key |
1 | 2010-08-01 | 2010-08-07 | null | null | C00000001 |
2 | 2010-09-11 | 2010-09-13 | null | 2010-08-07 | C00000001 |
3 | 2016-08-14 | null | 2016-09-01 | 2010-09-13 | C00000001 |
4 | 2017-08-16 | null | 2017-08-18 | 2010-09-13 | C00000001 |
5 | 2020-11-10 | 2020-11-15 | 2010-09-13 | C00000001 | |
6 | 2020-11-11 | 2020-11-15 | 2020-11-15 | C00000001 | |
7 | 2021-10-01 | 2021-10-04 | 2020-11-15 | C00000001 |
*****************수정했습니다.
WITH TABLE_T AS ( SELECT '1' AS seq, '2010-08-07' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '2' AS seq, '2010-09-13' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '3' AS seq, null AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '4' AS seq, '2020-11-15' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '5' AS seq, '2020-11-15' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '6' AS seq, '2021-10-04' AS visit_date, 'C00000001' AS cust_key ) , TABLE_E AS ( SELECT seq, cust_key ,CASE WHEN visit_date IS NULL THEN LAG(visit_date) OVER (PARTITION BY cust_key ORDER BY seq) ELSE visit_date END AS visit_date FROM TABLE_T T1 ) SELECT seq, cust_key, visit_date ,LAG(visit_date) OVER (PARTITION BY cust_key ORDER BY seq) FROM TABLE_E ORDER BY seq
날짜 부분에서...
- null 인 부분에 전 라인의 날짜를 넣는 한 단계가 추가 되면 될거 같습니다.
+ order by 를 visit_date 가 아닌 seq 로 설정해야 할듯 하네요.
WITH TABLE_1 AS ( SELECT '1' AS seq, '2010-08-07' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '2' AS seq, '2010-09-13' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '3' AS seq, null AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '4' AS seq, null AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '5' AS seq, '2020-11-15' AS visit_date, 'C00000001' AS cust_key UNION ALL SELECT '6' AS seq, '2021-10-04' AS visit_date, 'C00000001' AS cust_key ) , TABLE_2 AS ( SELECT T1.seq , MAX(T2.seq) as seq_2 FROM TABLE_1 T1 , TABLE_1 T2 WHERE T2.visit_date is not null AND T1.seq >= T2.seq AND T1.cust_key = T2.cust_key GROUP BY T1.seq ) , TABLE_3 AS ( SELECT T2.seq, T2.visit_date, T3.visit_date AS visit_date_2 , T2.cust_key FROM TABLE_2 T1 ,TABLE_1 T2 ,TABLE_1 T3 WHERE T1.seq = T2.seq AND T1.SEQ_2 = T3.seq ) SELECT seq, cust_key, visit_date,visit_date_2 ,LAG(visit_date_2) OVER (PARTITION BY cust_key ORDER BY seq) AS visit_date_prv FROM TABLE_3 ORDER BY seq
단계가 많이 생기긴 했는데....
CROSS JOIN 을 이용해서 처리한 방법 입니다.
조건이 늘어날 수록 복잡해지네요.
... 급하게 만든거라 정확한지 검수가 부족할 수 있습니다... 컨셉만 확인해주시길...
WITH AA (seq,reserv_date,visit_date, cancel_date, cust_key) AS ( SELECT 1, '2010-08-01', '2010-08-07', NULL, 'C00000001' UNION ALL SELECT 2, '2010-09-11', '2010-09-13', NULL, 'C00000001' UNION ALL SELECT 3, '2016-08-14', NULL ,'2016-09-01','C00000001' UNION ALL SELECT 4, '2017-08-16', NULL, '2017-08-18','C00000001' UNION ALL SELECT 5, '2020-11-10', '2020-11-15', NULL, 'C00000001' UNION ALL SELECT 6, '2020-11-11', '2020-11-15', NULL, 'C00000001' UNION ALL SELECT 7, '2021-10-01', '2021-10-04', NULL, 'C00000001' UNION ALL SELECT 1, '2015-11-05', '2015-11-06', NULL, 'C00000002' UNION ALL SELECT 2, '2016-11-01', '2016-11-30', NULL, 'C00000002' ) SELECT seq,reserv_date , visit_date , cancel_date , LAG(visit_date2) OVER(PARTITION BY cust_key ORDER BY cust_key, reserv_date) visit_date_prv , cust_key FROM ( SELECT seq,reserv_date , visit_date , cancel_date, cust_key , CASE WHEN visit_date IS NULL THEN MAX(visit_date) OVER (PARTITION BY cust_key ORDER BY cust_key, reserv_date) ELSE visit_date END AS visit_date2 FROM AA ) BB ORDER BY cust_key, reserv_date
WITH t (seq, reserv_date, visit_date, cancel_date, cust_key) AS ( SELECT 1, '2010-08-01', '2010-08-07', null , 'C00000001' UNION ALL SELECT 2, '2010-09-11', '2010-09-13', null , 'C00000001' UNION ALL SELECT 3, '2016-08-14', null , '2016-09-01', 'C00000001' UNION ALL SELECT 4, '2017-08-16', null , '2017-08-18', 'C00000001' UNION ALL SELECT 5, '2020-11-10', '2020-11-15', null , 'C00000001' UNION ALL SELECT 6, '2020-11-11', '2020-11-15', null , 'C00000001' UNION ALL SELECT 7, '2021-10-01', '2021-10-04', null , 'C00000001' UNION ALL SELECT 1, '2015-11-05', '2015-11-06', null , 'C00000002' UNION ALL SELECT 2, '2016-11-01', '2016-11-30', null , 'C00000002' ) SELECT seq , cust_key , reserv_date , visit_date , cancel_date , MAX(dt) OVER(PARTITION BY cust_key, grp) reserv_date_pre FROM (SELECT seq , cust_key , reserv_date , visit_date , cancel_date , LAG(visit_date) OVER(PARTITION BY cust_key ORDER BY reserv_date) dt , COUNT(visit_date) OVER(PARTITION BY cust_key ORDER BY reserv_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) grp FROM t ) a ;