기본기 톺아보기 #5 - SQL
Intro
- Leetcode 에 무료로 풀려있는 Hard 문제 3문제를 정리해보았습니다.
- MSSQL로 풀이하였습니다.
Contents
Human Traffic of Stadium
Input:
Stadium table:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
Output:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
- 3일 연속 방문자가 100명 이상인 ROW를 리턴하면 되는 문제였습니다.
WITH ConsecutiveIDs AS (
SELECT S1.id AS id1, S3.id AS id2
FROM Stadium AS S1 LEFT JOIN Stadium AS S2
ON S1.id = S2.id + 1 LEFT JOIN Stadium AS S3
ON S1.id = S3.id + 2
WHERE S1.people >= 100 AND
S2.people >= 100 AND
S3.people >= 100
)
SELECT DISTINCT S1.*
FROM Stadium as S1, ConsecutiveIDs as CIDs
WHERE id BETWEEN CIDs.id2 AND CIDs.id1
- 우선 WITH문안에서 Self Join하여 3일 연속 100명 이상인 ROW의 id값만 모두 구합니다.
- 제일 큰 id값과 제일 작은 id만을 ConsecutiveIDs안에 넣습니다.
- 아래에서 id가 ConsecutiveIDs사이에 있는 id면 표시하여 줍니다.
- 중복되는 값이 포시되므로 DISTINCT를 사용합니다.
Department Top Three Salaries
Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
- 각 부서에서 top3 Salary를 받는 사람들만 표시하여 줍니다.
WITH EMP_DE AS (
SELECT E.name as EName, E.salary, D.name as DName
FROM Employee AS E LEFT JOIN
Department AS D
ON E.departmentId = D.id
)
SELECT Department, Employee, Salary
FROM (
SELECT DName as Department,
EName as Employee,
salary as Salary,
DENSE_RANK() OVER(
PARTITION BY DName ORDER BY salary DESC
) as rnk
FROM EMP_DE
) as list
WHERE rnk <= 3
- EMP_DE로 두 table을 JOIN한 WITH문을 만들어 줍니다.
- FROM subquery안에서 rank를 구합니다.
- DENSE_RANK : 중복순위 존재 & 순위생략 X 이기때문에 사용하였습니다.
- PARTITION BY : rank를 각 부서별로 구해야 합니다.
- WHERE절에서 rank를 3이하인 것들만 표시합니다.
Trips and Users
Input:
Trips table:
+----+-----------+-----------+---------+---------------------+------------+
| id | client_id | driver_id | city_id | status | request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users table:
+----------+--------+--------+
| users_id | banned | role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Output:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
- "2013-10-01" ~ "2013-10-03" 기간에서 취소율을 구합니다.
- ban된 유저는 카운트를 하지않습니다.
- 소수점 둘째자리까지 표시합니다.
WITH UnbannedClient AS (
SELECT *
FROM Users
WHERE banned != 'Yes'
AND role = 'client'
),
UnbannedDriver AS (
SELECT *
FROM Users
WHERE banned != 'Yes'
AND role = 'driver'
)
SELECT
T.request_at AS Day,
CONVERT(
NUMERIC(3,2),
AVG(CASE WHEN T.status LIKE 'cancelled%' THEN 1.0 ELSE 0 END), 2
) as "Cancellation Rate"
FROM Trips as T LEFT JOIN UnbannedClient AS UC
ON T.client_id = UC.users_id
LEFT JOIN UnbannedDriver AS UD
ON T.driver_id = UD.users_id
WHERE UC.users_id is not null
AND UD.users_id is not null
AND T.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY T.request_at
- 밴되지 않음 client와 driver를 WITH문으로 구합니다.
- FROM & WHERE 절은 JOIN하여서 밴되지 않은 기록만 남기는 과정입니다 + 기간에 맞는것만 남깁니다.
- GROUP BY 로 날짜별로 그룹핑합니다.
- CONVERT + NUMERIC 조합으로 소수점 둘째자리까지 표시합니다.
- rate를 구하는것은 AVG를 이용해서 평균을 구하고
- cancelled가 포함된 경우 1 아닌경우 0 으로 해서 그 값을 평균으로 구합니다.
Outro
- 제 풀이가 최선의 풀이는 아닐수도 있습니다.
- 알고리즘과는 다르게 sql은 무료로 연습하기가 힘들었습니다.
- Hackerrank같은 경우는 몇몇 문제가 조건이 명확하지 않아 힘들었던것으로 기억납니다.
Author And Source
이 문제에 관하여(기본기 톺아보기 #5 - SQL), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://velog.io/@ehddnr/기본기-톺아보기-5-SQL
저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
Human Traffic of Stadium
Input:
Stadium table:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
Output:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
WITH ConsecutiveIDs AS (
SELECT S1.id AS id1, S3.id AS id2
FROM Stadium AS S1 LEFT JOIN Stadium AS S2
ON S1.id = S2.id + 1 LEFT JOIN Stadium AS S3
ON S1.id = S3.id + 2
WHERE S1.people >= 100 AND
S2.people >= 100 AND
S3.people >= 100
)
SELECT DISTINCT S1.*
FROM Stadium as S1, ConsecutiveIDs as CIDs
WHERE id BETWEEN CIDs.id2 AND CIDs.id1
- 중복되는 값이 포시되므로 DISTINCT를 사용합니다.
Department Top Three Salaries
Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
WITH EMP_DE AS (
SELECT E.name as EName, E.salary, D.name as DName
FROM Employee AS E LEFT JOIN
Department AS D
ON E.departmentId = D.id
)
SELECT Department, Employee, Salary
FROM (
SELECT DName as Department,
EName as Employee,
salary as Salary,
DENSE_RANK() OVER(
PARTITION BY DName ORDER BY salary DESC
) as rnk
FROM EMP_DE
) as list
WHERE rnk <= 3
- DENSE_RANK : 중복순위 존재 & 순위생략 X 이기때문에 사용하였습니다.
- PARTITION BY : rank를 각 부서별로 구해야 합니다.
Trips and Users
Input:
Trips table:
+----+-----------+-----------+---------+---------------------+------------+
| id | client_id | driver_id | city_id | status | request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users table:
+----------+--------+--------+
| users_id | banned | role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Output:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
WITH UnbannedClient AS (
SELECT *
FROM Users
WHERE banned != 'Yes'
AND role = 'client'
),
UnbannedDriver AS (
SELECT *
FROM Users
WHERE banned != 'Yes'
AND role = 'driver'
)
SELECT
T.request_at AS Day,
CONVERT(
NUMERIC(3,2),
AVG(CASE WHEN T.status LIKE 'cancelled%' THEN 1.0 ELSE 0 END), 2
) as "Cancellation Rate"
FROM Trips as T LEFT JOIN UnbannedClient AS UC
ON T.client_id = UC.users_id
LEFT JOIN UnbannedDriver AS UD
ON T.driver_id = UD.users_id
WHERE UC.users_id is not null
AND UD.users_id is not null
AND T.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY T.request_at
- cancelled가 포함된 경우 1 아닌경우 0 으로 해서 그 값을 평균으로 구합니다.
- 제 풀이가 최선의 풀이는 아닐수도 있습니다.
- 알고리즘과는 다르게 sql은 무료로 연습하기가 힘들었습니다.
- Hackerrank같은 경우는 몇몇 문제가 조건이 명확하지 않아 힘들었던것으로 기억납니다.
Author And Source
이 문제에 관하여(기본기 톺아보기 #5 - SQL), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@ehddnr/기본기-톺아보기-5-SQL저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)