wget 명령은kaggle에서 시작합니다.com 다운로드 파일

3683 단어
kaggle.com의 데이터 집합은 때때로 비교적 크고 온라인 디스크 다운로드 메커니즘을 제공하지 않아 국내에서 다운로드 속도가 매우 느리다. 동시에 다운로드는 검증을 필요로 하고 신속한 도구로 다운로드할 수 없다.
kaggle 포럼에서 wget의 다운로드 방식 소개를 보았습니다[1]:
방법은 먼저 Kaggle에 로그인하는 것입니다.com, 브라우저의 쿠키를 기록하고 쿠키를 쿠키에 저장합니다.txt에서 다음 명령을 실행합니다.
wget -x --load-cookies cookies.txt -P data -nH --cut-dirs=5 http://www.kaggle.com/c/avazu-ctr-prediction/download/test.gz

하지만 곧 실행이 끝났습니다. 14kb만 다운로드했습니다. 문제가 있을 것입니다.
[zhf@localhost ~]$ wget -x --load-cookies cookies.txt https://www.kaggle.com/c/avazu-ctr-prediction/download/test.gz
--2015-11-02 23:35:29--  https://www.kaggle.com/c/avazu-ctr-prediction/download/test.gz
Resolving www.kaggle.com (www.kaggle.com)... 168.62.224.124
Connecting to www.kaggle.com (www.kaggle.com)|168.62.224.124|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /account/login?ReturnUrl=%2fc%2favazu-ctr-prediction%2fdownload%2ftest.gz [following]
--2015-11-02 23:35:32--  https://www.kaggle.com/account/login?ReturnUrl=%2fc%2favazu-ctr-prediction%2fdownload%2ftest.gz
Reusing existing connection to www.kaggle.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 14687 (14K) [text/html]
Saving to: ‘www.kaggle.com/c/avazu-ctr-prediction/download/test.gz’

100%[===========================================================================================>] 14,687      --.-K/s   in 0.03s   

2015-11-02 23:35:33 (450 KB/s) - ‘www.kaggle.com/c/avazu-ctr-prediction/download/test.gz’ saved [14687/14687]

위의 로그에서 볼 수 있듯이 로 리디렉션되었습니다."
https://www.kaggle.com/account/login?ReturnUrl=%2fc%2favazu-ctr-prediction%2fdownload%2ftest.gz가 갔어요.
그래서, 우리는 wget의post 데이터 매개 변수로 사용자 이름, 비밀번호를 제출합니다.
[zhf@localhost ~]$ wget https://www.kaggle.com/account/login?ReturnUrl=%2fc%2favazu-ctr-prediction%2fdownload%2ftest.gz --post-data 'username=login_name&password=login_password'

일반 다운로드:
--2015-11-02 23:37:18--  https://www.kaggle.com/account/login?ReturnUrl=%2fc%2favazu-ctr-prediction%2fdownload%2ftest.gz
Resolving www.kaggle.com (www.kaggle.com)... 168.62.224.124
Connecting to www.kaggle.com (www.kaggle.com)|168.62.224.124|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /c/avazu-ctr-prediction/download/test.gz [following]
--2015-11-02 23:37:19--  https://www.kaggle.com/c/avazu-ctr-prediction/download/test.gz
Reusing existing connection to www.kaggle.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://kaggle2.blob.core.windows.net/competitions-data/kaggle/4120/test.gz?sv=2012-02-12&se=2015-11-05T07%3A39%3A03Z&sr=b&sp=r&sig=rKgKT2uZE6B4sLTirB1qdR8o262a9BgQPh233olSedg%3D [following]
--2015-11-02 23:37:20--  https://kaggle2.blob.core.windows.net/competitions-data/kaggle/4120/test.gz?sv=2012-02-12&se=2015-11-05T07%3A39%3A03Z&sr=b&sp=r&sig=rKgKT2uZE6B4sLTirB1qdR8o262a9BgQPh233olSedg%3D
Resolving kaggle2.blob.core.windows.net (kaggle2.blob.core.windows.net)... 23.98.55.152
Connecting to kaggle2.blob.core.windows.net (kaggle2.blob.core.windows.net)|23.98.55.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123803952 (118M) [application/x-gzip]
Saving to: ‘login?ReturnUrl=%2Fc%2Favazu-ctr-prediction%2Fdownload%2Ftest.gz’

 7% [======>                                                                                     ] 9,773,056   28.2KB/s  eta 36m 24s^C

이렇게 하면 다운로드 속도가 느리지만 백스테이지에서 실행할 수 있다.
참조:
[1]  https://www.kaggle.com/forums/f/15/kaggle-forum/t/6604/downloading-data-via-command-line

좋은 웹페이지 즐겨찾기