datetime 형식의 index로 설정된 주 수정
소개
시계열 데이터를 처리함에 따라 pandas에서 index를 datetime으로 하면 편리하지만, index에서 주를 추출할 때, 예를 들면 2019/12/30과 12/31 등 마지막 주가 2019년의 제1주가 된다 버리기 때문에 이것을 수정하는 방법을 조사한 비망록이다.
문제점
12월 마지막 주와 다음 해 1월 첫 주는 같은 주가 된다. 주 번호를 붙이는 방법은 여러 가지 있는 것 같지만, pandas에서는 ISO 준거의 유럽식과 같고, 그 주의 평일부터 제1주째로 인식된다.
참고 : 2019년 주간 캘린더 및 주 번호 목록
그러나, 2019/12/30, 31은 df.index.yaer -> 2019, df.index.week -> 1이 되기 때문에, 2019년의 제1주로 인식되어 버리기 때문에, 주 단위로의 데이터 집계 등의 처리에 형편이 나쁘다.
아래와 같은 datetime 형식의 index를 예로 나타낸다.
df.index
DatetimeIndex(['2019-12-29 22:00:00+00:00', '2019-12-29 22:00:00+00:00',
'2019-12-29 22:00:00+00:00', '2019-12-29 22:00:00+00:00',
'2019-12-29 22:00:00+00:00', '2019-12-29 22:08:00+00:00',
'2019-12-29 23:00:00+00:00', '2019-12-30 01:47:00+00:00',
'2019-12-30 02:48:00+00:00', '2019-12-30 02:48:00+00:00',
'2019-12-30 12:34:00+00:00', '2019-12-30 14:51:00+00:00',
'2019-12-30 14:53:00+00:00', '2019-12-30 14:56:00+00:00',
'2019-12-31 04:50:00+00:00', '2019-12-31 13:41:00+00:00',
'2019-12-31 14:42:00+00:00', '2019-12-31 14:45:00+00:00',
'2019-12-31 15:56:00+00:00', '2019-12-31 15:56:00+00:00',
'2019-12-31 15:58:00+00:00', '2019-12-31 15:58:00+00:00'],
dtype='datetime64[ns, UTC]', name='date', freq=None)
이 데이터 프레임에 대해서, 아래와 같이 MultiIndex를 설정하면, 12/29는 52주인 것에 대해, 12/30, 31은 2019년의 제1주가 되어 버리는 것이 확인할 수 있다.
df_w = df.set_index([df.index.year, df.index.month,
df.index.week, df.index])
df_w.index.names = ['year', 'month', 'week', 'date']
df_w.sort_index(inplace=True)
df_w.index
MultiIndex([(2019, 12, 1, '2019-12-30 01:47:00+00:00'),
(2019, 12, 1, '2019-12-30 02:48:00+00:00'),
(2019, 12, 1, '2019-12-30 02:48:00+00:00'),
(2019, 12, 1, '2019-12-30 12:34:00+00:00'),
(2019, 12, 1, '2019-12-30 14:51:00+00:00'),
(2019, 12, 1, '2019-12-30 14:53:00+00:00'),
(2019, 12, 1, '2019-12-30 14:56:00+00:00'),
(2019, 12, 1, '2019-12-31 04:50:00+00:00'),
(2019, 12, 1, '2019-12-31 13:41:00+00:00'),
(2019, 12, 1, '2019-12-31 14:42:00+00:00'),
(2019, 12, 1, '2019-12-31 14:45:00+00:00'),
(2019, 12, 1, '2019-12-31 15:56:00+00:00'),
(2019, 12, 1, '2019-12-31 15:56:00+00:00'),
(2019, 12, 1, '2019-12-31 15:58:00+00:00'),
(2019, 12, 1, '2019-12-31 15:58:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:08:00+00:00'),
(2019, 12, 52, '2019-12-29 23:00:00+00:00')],
names=['year', 'month', 'week', 'date'])
개선책
참고 사이트에 나의 요구하는 해결책이 있어, 자신의 케이스에 들어맞았습니다.
1. 일단 index를 해제
2. dt.week에서 주 추출
3. 강제로 52주로 변경
4. index 재설정
df.reset_index(inplace=True)
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["week"] = df["date"].dt.week
df["week"] = df["date"].apply(
lambda x: 52 if x.year == 2019 and x.month==12 and x.day in [30, 31] else x.week)
df_w = df.set_index([df["year"], df["month"],
df["week"], df["date"]])
df_w.index.names = ['year', 'month', 'week', 'date']
df_w.sort_index(inplace=True)
df_w.index
MultiIndex([(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:08:00+00:00'),
(2019, 12, 52, '2019-12-29 23:00:00+00:00'),
(2019, 12, 52, '2019-12-30 01:47:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 12:34:00+00:00'),
(2019, 12, 52, '2019-12-30 14:51:00+00:00'),
(2019, 12, 52, '2019-12-30 14:53:00+00:00'),
(2019, 12, 52, '2019-12-30 14:56:00+00:00'),
(2019, 12, 52, '2019-12-31 04:50:00+00:00'),
(2019, 12, 52, '2019-12-31 13:41:00+00:00'),
(2019, 12, 52, '2019-12-31 14:42:00+00:00'),
(2019, 12, 52, '2019-12-31 14:45:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00')],
names=['year', 'month', 'week', 'date'])
이를 통해 매년 주별 집계가 가능합니다.
참고 사이트
같은 고민에 대한 해결책이 매우 도움이 되었습니다. 감사!
Pandas - wrong week extracted week from date
Reference
이 문제에 관하여(datetime 형식의 index로 설정된 주 수정), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/Shin-Bass/items/2ec3f92856fad1d42651
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
12월 마지막 주와 다음 해 1월 첫 주는 같은 주가 된다. 주 번호를 붙이는 방법은 여러 가지 있는 것 같지만, pandas에서는 ISO 준거의 유럽식과 같고, 그 주의 평일부터 제1주째로 인식된다.
참고 : 2019년 주간 캘린더 및 주 번호 목록
그러나, 2019/12/30, 31은 df.index.yaer -> 2019, df.index.week -> 1이 되기 때문에, 2019년의 제1주로 인식되어 버리기 때문에, 주 단위로의 데이터 집계 등의 처리에 형편이 나쁘다.
아래와 같은 datetime 형식의 index를 예로 나타낸다.
df.index
DatetimeIndex(['2019-12-29 22:00:00+00:00', '2019-12-29 22:00:00+00:00',
'2019-12-29 22:00:00+00:00', '2019-12-29 22:00:00+00:00',
'2019-12-29 22:00:00+00:00', '2019-12-29 22:08:00+00:00',
'2019-12-29 23:00:00+00:00', '2019-12-30 01:47:00+00:00',
'2019-12-30 02:48:00+00:00', '2019-12-30 02:48:00+00:00',
'2019-12-30 12:34:00+00:00', '2019-12-30 14:51:00+00:00',
'2019-12-30 14:53:00+00:00', '2019-12-30 14:56:00+00:00',
'2019-12-31 04:50:00+00:00', '2019-12-31 13:41:00+00:00',
'2019-12-31 14:42:00+00:00', '2019-12-31 14:45:00+00:00',
'2019-12-31 15:56:00+00:00', '2019-12-31 15:56:00+00:00',
'2019-12-31 15:58:00+00:00', '2019-12-31 15:58:00+00:00'],
dtype='datetime64[ns, UTC]', name='date', freq=None)
이 데이터 프레임에 대해서, 아래와 같이 MultiIndex를 설정하면, 12/29는 52주인 것에 대해, 12/30, 31은 2019년의 제1주가 되어 버리는 것이 확인할 수 있다.
df_w = df.set_index([df.index.year, df.index.month,
df.index.week, df.index])
df_w.index.names = ['year', 'month', 'week', 'date']
df_w.sort_index(inplace=True)
df_w.index
MultiIndex([(2019, 12, 1, '2019-12-30 01:47:00+00:00'),
(2019, 12, 1, '2019-12-30 02:48:00+00:00'),
(2019, 12, 1, '2019-12-30 02:48:00+00:00'),
(2019, 12, 1, '2019-12-30 12:34:00+00:00'),
(2019, 12, 1, '2019-12-30 14:51:00+00:00'),
(2019, 12, 1, '2019-12-30 14:53:00+00:00'),
(2019, 12, 1, '2019-12-30 14:56:00+00:00'),
(2019, 12, 1, '2019-12-31 04:50:00+00:00'),
(2019, 12, 1, '2019-12-31 13:41:00+00:00'),
(2019, 12, 1, '2019-12-31 14:42:00+00:00'),
(2019, 12, 1, '2019-12-31 14:45:00+00:00'),
(2019, 12, 1, '2019-12-31 15:56:00+00:00'),
(2019, 12, 1, '2019-12-31 15:56:00+00:00'),
(2019, 12, 1, '2019-12-31 15:58:00+00:00'),
(2019, 12, 1, '2019-12-31 15:58:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:08:00+00:00'),
(2019, 12, 52, '2019-12-29 23:00:00+00:00')],
names=['year', 'month', 'week', 'date'])
개선책
참고 사이트에 나의 요구하는 해결책이 있어, 자신의 케이스에 들어맞았습니다.
1. 일단 index를 해제
2. dt.week에서 주 추출
3. 강제로 52주로 변경
4. index 재설정
df.reset_index(inplace=True)
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["week"] = df["date"].dt.week
df["week"] = df["date"].apply(
lambda x: 52 if x.year == 2019 and x.month==12 and x.day in [30, 31] else x.week)
df_w = df.set_index([df["year"], df["month"],
df["week"], df["date"]])
df_w.index.names = ['year', 'month', 'week', 'date']
df_w.sort_index(inplace=True)
df_w.index
MultiIndex([(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:08:00+00:00'),
(2019, 12, 52, '2019-12-29 23:00:00+00:00'),
(2019, 12, 52, '2019-12-30 01:47:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 12:34:00+00:00'),
(2019, 12, 52, '2019-12-30 14:51:00+00:00'),
(2019, 12, 52, '2019-12-30 14:53:00+00:00'),
(2019, 12, 52, '2019-12-30 14:56:00+00:00'),
(2019, 12, 52, '2019-12-31 04:50:00+00:00'),
(2019, 12, 52, '2019-12-31 13:41:00+00:00'),
(2019, 12, 52, '2019-12-31 14:42:00+00:00'),
(2019, 12, 52, '2019-12-31 14:45:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00')],
names=['year', 'month', 'week', 'date'])
이를 통해 매년 주별 집계가 가능합니다.
참고 사이트
같은 고민에 대한 해결책이 매우 도움이 되었습니다. 감사!
Pandas - wrong week extracted week from date
Reference
이 문제에 관하여(datetime 형식의 index로 설정된 주 수정), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/Shin-Bass/items/2ec3f92856fad1d42651
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
df.reset_index(inplace=True)
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["week"] = df["date"].dt.week
df["week"] = df["date"].apply(
lambda x: 52 if x.year == 2019 and x.month==12 and x.day in [30, 31] else x.week)
df_w = df.set_index([df["year"], df["month"],
df["week"], df["date"]])
df_w.index.names = ['year', 'month', 'week', 'date']
df_w.sort_index(inplace=True)
df_w.index
MultiIndex([(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:00:00+00:00'),
(2019, 12, 52, '2019-12-29 22:08:00+00:00'),
(2019, 12, 52, '2019-12-29 23:00:00+00:00'),
(2019, 12, 52, '2019-12-30 01:47:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 02:48:00+00:00'),
(2019, 12, 52, '2019-12-30 12:34:00+00:00'),
(2019, 12, 52, '2019-12-30 14:51:00+00:00'),
(2019, 12, 52, '2019-12-30 14:53:00+00:00'),
(2019, 12, 52, '2019-12-30 14:56:00+00:00'),
(2019, 12, 52, '2019-12-31 04:50:00+00:00'),
(2019, 12, 52, '2019-12-31 13:41:00+00:00'),
(2019, 12, 52, '2019-12-31 14:42:00+00:00'),
(2019, 12, 52, '2019-12-31 14:45:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:56:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00'),
(2019, 12, 52, '2019-12-31 15:58:00+00:00')],
names=['year', 'month', 'week', 'date'])
같은 고민에 대한 해결책이 매우 도움이 되었습니다. 감사!
Pandas - wrong week extracted week from date
Reference
이 문제에 관하여(datetime 형식의 index로 설정된 주 수정), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/Shin-Bass/items/2ec3f92856fad1d42651텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)