2021.07.31 Project1 | 스파르타 코팅 클럽의 실제 데이터를 활용한 프로젝트 (가설 검증을 위한 SQL쿼리 정리)
사용 프로그램: DBeaver, MySQL, EXCEL
틀린 점이 있거나 추가적인 조언이 있으신 분들은 언제든지 피드백 남겨주시면 감사하겠습니다. 🙏
1. 테이블 및 변수 확인하기
2. 가설1
연령, 직업, 성별에 따라 선호하는 강좌에 차이가 있을 것이다.
(연령별 & 직업별 선호하는 강좌가 무엇일지, 남녀차이가 있는지 살펴본다.)
⭐강의명: 오늘의 책 → 제외하고 분석함 (이상치) / 결측값은 0으로 표기
⭐연령: 10세 미만과 70세 이상은 제외하고 분석
1) 강좌별 성별 분포 확인
#강좌별 전체 수강 인원
select c.title, p.course_id, COUNT(p.user_id) as count from prequestions p
inner join courses c on c.course_id = p.course_id
group by p.course_id
#강좌별 여성 수강생 인원
select c.title, p.course_id, p.gender, COUNT(gender) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where gender like '여'
group by p.course_id
#강좌별 남성 수강생 인원
select c.title, p.course_id, p.gender, COUNT(gender) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where gender like '남'
group by p.course_id
2) 강좌별 직업 분표 확인
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%직장인%'
group by p.course_id
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%학생%'
group by p.course_id
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%창업%'
group by p.course_id
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%무직%'
group by p.course_id
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%기타%'
group by p.course_id
#강좌별 직업+성별 분포 확인 (이하 동일)
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where job like '%직장인%' and gender like '남'
group by p.course_id
3) 강좌별 연령별 선호도 확인
select c.title,p.course_id, p.job, min(age) from prequestions p
inner join courses c on c.course_id = p.course_id
group by course_id
select c.title,p.course_id, p.job, max(age) from prequestions p
inner join courses c on c.course_id = p.course_id
group by course_id
#강좌별 평균연령 구하기
select c.title,p.course_id, ROUND(avg(age),0) as avg_age from prequestions p
inner join courses c on c.course_id = p.course_id
group by course_id
#강좌별 연령 분포 확인 (이하 동일)
#10대
select c.title, p.course_id, COUNT(age) as 10대 from prequestions p
inner join courses c on c.course_id = p.course_id
where age BETWEEN 10 and 19
group by p.course_id
#20대
select c.title, p.course_id, COUNT(age) as 20대 from prequestions p
inner join courses c on c.course_id = p.course_id
where age BETWEEN 20 and 29
group by p.course_id
3. 가설2
강좌별 완강 일자수가 연령별, 직업별로 다를 것이다.
⭐(강의)오늘의 책 → 제외하고 분석함 (이상치)
⭐(연령) 20대, 30대, 40대만 분석
⭐결측값은 0으로 표기
1) 강좌별 직업마다 평균 완강 일수가 다를 것이다.
#강좌별 평균 수강 일수
SELECT c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as date from enrolleds e
inner join courses c on c.course_id = e.course_id
where is_registered = 1
group by course_id
#직장인 그룹만 묶어서 보기
SELECT e.user_id,
p.job,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%직장인%'
group by course_id
#학생 그룹만 묶어서 보기
SELECT e.user_id,
p.job,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%학생%'
group by course_id
#무직/준비중 그룹만 묶어서 보기
SELECT e.user_id,
p.job,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%무직%'
group by course_id
#창업/사업 그룹만 묶어서 보기
SELECT e.user_id,
p.job,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%창업%'
group by course_id
#기타 그룹만 묶어서 보기
SELECT e.user_id,
p.job,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%기타%'
group by course_id
2) 강좌별 연령대마다 평균 완강 일수가 다를 것이다.
#20대
SELECT e.user_id,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 20 and 29
group by course_id
#30대
SELECT e.user_id,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 30 and 39
group by course_id
#40대
SELECT e.user_id,
c.title,
e.course_id,
round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 40 and 49
group by course_id
3. 가설3
1) 코스별로 주차별 완강률이 다를 것이다.
(예를 들어, 대학생 불꽃반 SQL은 3주차에 완강률 급격하게 떨어질 것이다.→ 이런 것을 확인하여 몇 주차에 관리 또는 독려할 수 있는 장치를 넣을지 생각해보고자 함.)
#전원이 1주차에 완강하는 경우 (기준점 구하기)
select c.title, ed.week, count(done) from enrolleds_detail ed
inner join enrolleds e on e.enrolled_id = ed.enrolled_id
inner join courses c on c.course_id = e.course_id
where week = 1
group by title
#주차별 완강 현황 (1주차~5주차까지) -> 6주차부터 X
select c.title, ed.week, count(done) from enrolleds_detail ed
inner join enrolleds e on e.enrolled_id = ed.enrolled_id
inner join courses c on c.course_id = e.course_id
where done = 1 and week = 1
group by title
Author And Source
이 문제에 관하여(2021.07.31 Project1 | 스파르타 코팅 클럽의 실제 데이터를 활용한 프로젝트 (가설 검증을 위한 SQL쿼리 정리)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@dkswldp95/2021.07.31-Project1-스파르타-코팅-클럽의-실제-데이터를-활용한-프로젝트-SQL을-활용한-가설-검증-및-분석쿼리-정리저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)