2021.07.31 Project1 | 스파르타 코팅 클럽의 실제 데이터를 활용한 프로젝트 (가설 검증을 위한 SQL쿼리 정리)

사용 프로그램: DBeaver, MySQL, EXCEL
틀린 점이 있거나 추가적인 조언이 있으신 분들은 언제든지 피드백 남겨주시면 감사하겠습니다. 🙏

1. 테이블 및 변수 확인하기


2. 가설1

연령, 직업, 성별에 따라 선호하는 강좌에 차이가 있을 것이다.
(연령별 & 직업별 선호하는 강좌가 무엇일지, 남녀차이가 있는지 살펴본다.)
⭐강의명: 오늘의 책 → 제외하고 분석함 (이상치) / 결측값은 0으로 표기
⭐연령: 10세 미만과 70세 이상은 제외하고 분석

1) 강좌별 성별 분포 확인

#강좌별 전체 수강 인원
select c.title, p.course_id, COUNT(p.user_id) as count from prequestions p
inner join courses c on c.course_id = p.course_id
group by p.course_id

#강좌별 여성 수강생 인원
select c.title, p.course_id, p.gender, COUNT(gender) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where gender like '여'
group by p.course_id

#강좌별 남성 수강생 인원
select c.title, p.course_id, p.gender, COUNT(gender) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where gender like '남'
group by p.course_id

2) 강좌별 직업 분표 확인

select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%직장인%'
group by p.course_id

select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%학생%'
group by p.course_id

select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%창업%'
group by p.course_id

select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%무직%'
group by p.course_id

select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where p.job like '%기타%'
group by p.course_id

#강좌별 직업+성별 분포 확인 (이하 동일)
select c.title, p.course_id, p.job, COUNT(job) as count from prequestions p
inner join courses c on c.course_id = p.course_id
where job like '%직장인%' and gender like '남'
group by p.course_id

3) 강좌별 연령별 선호도 확인

select c.title,p.course_id, p.job, min(age) from prequestions p
inner join courses c on c.course_id = p.course_id 
group by course_id

select c.title,p.course_id, p.job, max(age) from prequestions p
inner join courses c on c.course_id = p.course_id 
group by course_id

#강좌별 평균연령 구하기
select c.title,p.course_id, ROUND(avg(age),0) as avg_age from prequestions p
inner join courses c on c.course_id = p.course_id 
group by course_id

#강좌별 연령 분포 확인 (이하 동일)
#10대
select c.title, p.course_id, COUNT(age) as 10대 from prequestions p
inner join courses c on c.course_id = p.course_id
where age BETWEEN 10 and 19
group by p.course_id

#20대
select c.title, p.course_id, COUNT(age) as 20대 from prequestions p
inner join courses c on c.course_id = p.course_id
where age BETWEEN 20 and 29
group by p.course_id

3. 가설2

강좌별 완강 일자수가 연령별, 직업별로 다를 것이다.
⭐(강의)오늘의 책 → 제외하고 분석함 (이상치)
⭐(연령) 20대, 30대, 40대만 분석
⭐결측값은 0으로 표기

1) 강좌별 직업마다 평균 완강 일수가 다를 것이다.

#강좌별 평균 수강 일수
SELECT c.title,
       e.course_id,
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as date from enrolleds e
inner join courses c on c.course_id = e.course_id 
where is_registered = 1 
group by course_id

#직장인 그룹만 묶어서 보기
SELECT e.user_id, 
       p.job,
       c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%직장인%'
group by course_id

#학생 그룹만 묶어서 보기
SELECT e.user_id, 
       p.job,
       c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%학생%'
group by course_id

#무직/준비중 그룹만 묶어서 보기
SELECT e.user_id, 
	   p.job,
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%무직%'
group by course_id

#창업/사업 그룹만 묶어서 보기
SELECT e.user_id, 
	   p.job,
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%창업%'
group by course_id

#기타 그룹만 묶어서 보기
SELECT e.user_id, 
	   p.job,
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and job like '%기타%'
group by course_id

2) 강좌별 연령대마다 평균 완강 일수가 다를 것이다.

#20대
SELECT e.user_id, 
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 20 and 29
group by course_id

#30대
SELECT e.user_id, 
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 30 and 39
group by course_id

#40대
SELECT e.user_id, 
	   c.title,
       e.course_id, 
       round(avg(TIMESTAMPDIFF(day, start_date, end_date)),0) as avg_date from enrolleds e
inner join courses c on c.course_id = e.course_id 
inner join prequestions p on p.user_id = e.user_id
where is_registered = 1 and age BETWEEN 40 and 49
group by course_id

3. 가설3

1) 코스별로 주차별 완강률이 다를 것이다.

(예를 들어, 대학생 불꽃반 SQL은 3주차에 완강률 급격하게 떨어질 것이다.→ 이런 것을 확인하여 몇 주차에 관리 또는 독려할 수 있는 장치를 넣을지 생각해보고자 함.)

#전원이 1주차에 완강하는 경우 (기준점 구하기)
select c.title, ed.week, count(done) from enrolleds_detail ed
inner join enrolleds e on e.enrolled_id = ed.enrolled_id
inner join courses c on c.course_id = e.course_id
where week = 1
group by title

#주차별 완강 현황 (1주차~5주차까지) -> 6주차부터 X
select c.title, ed.week, count(done) from enrolleds_detail ed
inner join enrolleds e on e.enrolled_id = ed.enrolled_id
inner join courses c on c.course_id = e.course_id
where done = 1 and week = 1
group by title

좋은 웹페이지 즐겨찾기