Dancing To Music을 움직여 보았습니다.
12492 단어 TensorFlow기계 학습
개요
2020년에 발표된 논문 Dancing To Music의 소스 코드를 시험해 보았습니다.
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c
모듈 버전이 전혀 공개되지 않았기 때문에 작동하는 조합을 조사하는 데 많은 시간이 걸렸습니다. 버전을 명시한 requirement.txt도 동시에 공개해 주었으면 합니다. 이번에는 데모를 움직이는 곳까지 동작 확인할 수 있었습니다. 학습까지는 시도하지 않았습니다.
환경 준비
requirement.txt는 이런 식으로. 나는 로컬 우분투 18.04 환경 (GTX1080Ti)의 도커에서 실행 중이다. 다음 모듈 외에도 파이썬은 3.6.3에서 데모 동작을 확인할 수있었습니다. Python3.6.0에서는 움직이지 않으므로주의.
numpy==1.18.5
matplotlib
torch==1.7.0
torchvision==0.8.1
librosa==0.8.0
jupyter==1.0.0
opencv-python==4.4.0
tensorflow==2.3.1
ffmpeg도 필요합니다.
apt install ffmpeg
데이터 및 모델 다운로드
꽤 이해하기 어렵지만 readme.md Project에 링크가 있습니다.
htp://vbb. 우메레세 d. 에즈/hyee/단신g2무시c/sc리pt. txt
## Dataset
### Content
#### 3 zip files containing data of three dancing categories: Zumba, ballet, and hiphop.
####1 zip files containing data statistics and data path lists for trainint usage.
URL=http://vllab.ucmerced.edu/hylee/Dancing2Music/ballet.zip
wget -N $URL -O ./ballet.zip
unzip ./ballet.zip -d .
rm ./ballet.zip
...(以下略)
여기를 쉘에서 실행하면 다운로드가 시작됩니다.
(途中略)
./data.zip 100%[=========================================================================================================================================================================>] 1.33M 541KB/s in 2.5s
2020-12-06 15:06:55 (541 KB/s) - './data.zip' saved [1394787/1394787]
Archive: ./data.zip
inflating: ./stats/all_aud_mean.npy
inflating: ./stats/all_aud_std.npy
inflating: ./stats/all_onbeat_mean.npy
inflating: ./stats/all_onbeat_std.npy
inflating: ./stats/aud_3cls.ckpt
inflating: ./unitList/ballet_unitseq3.txt
inflating: ./unitList/ballet_unitseq4.txt
inflating: ./unitList/ballet_unit.txt
inflating: ./unitList/hiphop_unitseq3.txt
inflating: ./unitList/hiphop_unitseq4.txt
inflating: ./unitList/hiphop_unit.txt
inflating: ./unitList/zumba_unitseq3.txt
inflating: ./unitList/zumba_unitseq4.txt
inflating: ./unitList/zumba_unit.txt
--2020-12-06 15:06:55-- http://vllab.ucmerced.edu/hylee/Dancing2Music/Stage1.ckpt
Resolving vllab.ucmerced.edu (vllab.ucmerced.edu)... 169.236.184.69
Connecting to vllab.ucmerced.edu (vllab.ucmerced.edu)|169.236.184.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185511583 (177M) [text/plain]
Saving to: 'Stage1.ckpt'
(以下略)
소스 코드 다운로드 및 실행
여기에서 git clone 한 소스에서는 파일이 부족하기 때문에 움직이지 않습니다!
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c
저자가 자신의 페이지에 게시하는 것을 다운로드합니다 (이것은 끔찍합니다)
htp : // v ぁ b. 우메레세 d. 에즈/hye에/단신g2무시c/에서도. 지 p
demo 폴더 안에 checkpoint 폴더를 만들어 다운로드한 체크포인트의 파일을 넣습니다.
%mkdir demo/checkpoint
%cp Stage1.ckpt demo/checkpoint
%cp Stage2.ckpt demo/checkpoint
demo.py를 실행합니다. aud_path에 입력 오디오 파일을 지정합니다. --out_file에 출력할 댄스의 동영상 파일 이름을 지정합니다. 두 번째 체크 포인트는 --resume으로 지정합니다 (github 문서와 다르므로주의)
%Dancing2Music/demo# cat demo.sh
python demo.py --decomp_snapshot checkpoint/Stage1.ckpt --resume checkpoint/Stage2.ckpt --aud_path demo/ChillingMusic.wav --out_file demo/output.mp4 --out_dir demo/out_frame
잘하면 아래와 같은 표시 후에 output.mp4가 출력됩니다.
%sh demo.sh
2020-12-14 12:12:08.409682: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-14 12:12:08.409714: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'demo/ChillingMusic-formatted.wav':
Metadata:
ISFT : Lavf57.83.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc57.107.100 pcm_s16le
size= 1180kB time=00:00:27.40 bitrate= 352.8kbits/s speed=1.26e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.006453%
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
Loading Done
process 0/5
process 1/5
process 2/5
process 3/5
process 4/5
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Input #0, image2, from 'demo_output/frame%03d.png':
Duration: 00:00:19.20, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 500x256, 25 fps, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[image2 @ 0x561b882ddb20] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[libx264 @ 0x561b883bb060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x561b883bb060] profile High, level 2.1
[libx264 @ 0x561b883bb060] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'demo/output.mp4':
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 500x256, q=-1--1, 30 fps, 15360 tbn, 30 tbc
Metadata:
encoder : Lavc57.107.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc57.107.100 aac
frame= 959 fps=671 q=-1.0 Lsize= 747kB time=00:00:31.86 bitrate= 192.0kbits/s dup=479 drop=0 speed=22.3x
video:277kB audio:438kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.482303%
[libx264 @ 0x561b883bb060] frame I:4 Avg QP: 6.57 size: 2596
[libx264 @ 0x561b883bb060] frame P:251 Avg QP:22.27 size: 736
[libx264 @ 0x561b883bb060] frame B:704 Avg QP:18.87 size: 125
[libx264 @ 0x561b883bb060] consecutive B-frames: 0.5% 4.6% 0.6% 94.3%
[libx264 @ 0x561b883bb060] mb I I16..4: 90.4% 1.6% 8.1%
[libx264 @ 0x561b883bb060] mb P I16..4: 0.4% 0.9% 0.2% P16..4: 3.2% 3.0% 2.5% 0.0% 0.0% skip:89.9%
[libx264 @ 0x561b883bb060] mb B I16..4: 0.1% 0.0% 0.0% B16..8: 5.5% 0.9% 0.3% direct: 0.1% skip:93.0% L0:28.4% L1:69.4% BI: 2.2%
[libx264 @ 0x561b883bb060] 8x8 transform intra:28.0% inter:6.3%
[libx264 @ 0x561b883bb060] coded y,uvDC,uvAC intra: 4.4% 16.2% 13.6% inter: 0.7% 2.2% 2.0%
[libx264 @ 0x561b883bb060] i16 v,h,dc,p: 89% 5% 6% 0%
[libx264 @ 0x561b883bb060] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 2% 94% 0% 0% 0% 0% 0% 0%
[libx264 @ 0x561b883bb060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 11% 41% 3% 2% 4% 2% 5% 1%
[libx264 @ 0x561b883bb060] i8c dc,h,v,p: 71% 9% 19% 1%
[libx264 @ 0x561b883bb060] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x561b883bb060] ref P L0: 58.2% 3.0% 20.7% 18.1%
[libx264 @ 0x561b883bb060] ref B L0: 69.6% 20.0% 10.4%
[libx264 @ 0x561b883bb060] ref B L1: 97.4% 2.6%
[libx264 @ 0x561b883bb060] kb/s:70.94
[aac @ 0x561b883bea60] Qavg: 397.796
실행 결과
이런 느낌의 동영상이 출력됩니다(Qiita의 사정으로 gif로 하고 있습니다만 실제로는 mp4로 소리가 나옵니다)
소리와 맞는가 하면, 맞는 것처럼 보이지만 미묘한 곳입니다. 역시이 손의 논문은 실제로 움직여 보는 것에 한합니다.
Reference
이 문제에 관하여(Dancing To Music을 움직여 보았습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/tatefuku_hiroshi/items/ea5035361d569a259a03
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
requirement.txt는 이런 식으로. 나는 로컬 우분투 18.04 환경 (GTX1080Ti)의 도커에서 실행 중이다. 다음 모듈 외에도 파이썬은 3.6.3에서 데모 동작을 확인할 수있었습니다. Python3.6.0에서는 움직이지 않으므로주의.
numpy==1.18.5
matplotlib
torch==1.7.0
torchvision==0.8.1
librosa==0.8.0
jupyter==1.0.0
opencv-python==4.4.0
tensorflow==2.3.1
ffmpeg도 필요합니다.
apt install ffmpeg
데이터 및 모델 다운로드
꽤 이해하기 어렵지만 readme.md Project에 링크가 있습니다.
htp://vbb. 우메레세 d. 에즈/hyee/단신g2무시c/sc리pt. txt
## Dataset
### Content
#### 3 zip files containing data of three dancing categories: Zumba, ballet, and hiphop.
####1 zip files containing data statistics and data path lists for trainint usage.
URL=http://vllab.ucmerced.edu/hylee/Dancing2Music/ballet.zip
wget -N $URL -O ./ballet.zip
unzip ./ballet.zip -d .
rm ./ballet.zip
...(以下略)
여기를 쉘에서 실행하면 다운로드가 시작됩니다.
(途中略)
./data.zip 100%[=========================================================================================================================================================================>] 1.33M 541KB/s in 2.5s
2020-12-06 15:06:55 (541 KB/s) - './data.zip' saved [1394787/1394787]
Archive: ./data.zip
inflating: ./stats/all_aud_mean.npy
inflating: ./stats/all_aud_std.npy
inflating: ./stats/all_onbeat_mean.npy
inflating: ./stats/all_onbeat_std.npy
inflating: ./stats/aud_3cls.ckpt
inflating: ./unitList/ballet_unitseq3.txt
inflating: ./unitList/ballet_unitseq4.txt
inflating: ./unitList/ballet_unit.txt
inflating: ./unitList/hiphop_unitseq3.txt
inflating: ./unitList/hiphop_unitseq4.txt
inflating: ./unitList/hiphop_unit.txt
inflating: ./unitList/zumba_unitseq3.txt
inflating: ./unitList/zumba_unitseq4.txt
inflating: ./unitList/zumba_unit.txt
--2020-12-06 15:06:55-- http://vllab.ucmerced.edu/hylee/Dancing2Music/Stage1.ckpt
Resolving vllab.ucmerced.edu (vllab.ucmerced.edu)... 169.236.184.69
Connecting to vllab.ucmerced.edu (vllab.ucmerced.edu)|169.236.184.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185511583 (177M) [text/plain]
Saving to: 'Stage1.ckpt'
(以下略)
소스 코드 다운로드 및 실행
여기에서 git clone 한 소스에서는 파일이 부족하기 때문에 움직이지 않습니다!
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c
저자가 자신의 페이지에 게시하는 것을 다운로드합니다 (이것은 끔찍합니다)
htp : // v ぁ b. 우메레세 d. 에즈/hye에/단신g2무시c/에서도. 지 p
demo 폴더 안에 checkpoint 폴더를 만들어 다운로드한 체크포인트의 파일을 넣습니다.
%mkdir demo/checkpoint
%cp Stage1.ckpt demo/checkpoint
%cp Stage2.ckpt demo/checkpoint
demo.py를 실행합니다. aud_path에 입력 오디오 파일을 지정합니다. --out_file에 출력할 댄스의 동영상 파일 이름을 지정합니다. 두 번째 체크 포인트는 --resume으로 지정합니다 (github 문서와 다르므로주의)
%Dancing2Music/demo# cat demo.sh
python demo.py --decomp_snapshot checkpoint/Stage1.ckpt --resume checkpoint/Stage2.ckpt --aud_path demo/ChillingMusic.wav --out_file demo/output.mp4 --out_dir demo/out_frame
잘하면 아래와 같은 표시 후에 output.mp4가 출력됩니다.
%sh demo.sh
2020-12-14 12:12:08.409682: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-14 12:12:08.409714: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'demo/ChillingMusic-formatted.wav':
Metadata:
ISFT : Lavf57.83.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc57.107.100 pcm_s16le
size= 1180kB time=00:00:27.40 bitrate= 352.8kbits/s speed=1.26e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.006453%
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
Loading Done
process 0/5
process 1/5
process 2/5
process 3/5
process 4/5
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Input #0, image2, from 'demo_output/frame%03d.png':
Duration: 00:00:19.20, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 500x256, 25 fps, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[image2 @ 0x561b882ddb20] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[libx264 @ 0x561b883bb060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x561b883bb060] profile High, level 2.1
[libx264 @ 0x561b883bb060] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'demo/output.mp4':
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 500x256, q=-1--1, 30 fps, 15360 tbn, 30 tbc
Metadata:
encoder : Lavc57.107.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc57.107.100 aac
frame= 959 fps=671 q=-1.0 Lsize= 747kB time=00:00:31.86 bitrate= 192.0kbits/s dup=479 drop=0 speed=22.3x
video:277kB audio:438kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.482303%
[libx264 @ 0x561b883bb060] frame I:4 Avg QP: 6.57 size: 2596
[libx264 @ 0x561b883bb060] frame P:251 Avg QP:22.27 size: 736
[libx264 @ 0x561b883bb060] frame B:704 Avg QP:18.87 size: 125
[libx264 @ 0x561b883bb060] consecutive B-frames: 0.5% 4.6% 0.6% 94.3%
[libx264 @ 0x561b883bb060] mb I I16..4: 90.4% 1.6% 8.1%
[libx264 @ 0x561b883bb060] mb P I16..4: 0.4% 0.9% 0.2% P16..4: 3.2% 3.0% 2.5% 0.0% 0.0% skip:89.9%
[libx264 @ 0x561b883bb060] mb B I16..4: 0.1% 0.0% 0.0% B16..8: 5.5% 0.9% 0.3% direct: 0.1% skip:93.0% L0:28.4% L1:69.4% BI: 2.2%
[libx264 @ 0x561b883bb060] 8x8 transform intra:28.0% inter:6.3%
[libx264 @ 0x561b883bb060] coded y,uvDC,uvAC intra: 4.4% 16.2% 13.6% inter: 0.7% 2.2% 2.0%
[libx264 @ 0x561b883bb060] i16 v,h,dc,p: 89% 5% 6% 0%
[libx264 @ 0x561b883bb060] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 2% 94% 0% 0% 0% 0% 0% 0%
[libx264 @ 0x561b883bb060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 11% 41% 3% 2% 4% 2% 5% 1%
[libx264 @ 0x561b883bb060] i8c dc,h,v,p: 71% 9% 19% 1%
[libx264 @ 0x561b883bb060] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x561b883bb060] ref P L0: 58.2% 3.0% 20.7% 18.1%
[libx264 @ 0x561b883bb060] ref B L0: 69.6% 20.0% 10.4%
[libx264 @ 0x561b883bb060] ref B L1: 97.4% 2.6%
[libx264 @ 0x561b883bb060] kb/s:70.94
[aac @ 0x561b883bea60] Qavg: 397.796
실행 결과
이런 느낌의 동영상이 출력됩니다(Qiita의 사정으로 gif로 하고 있습니다만 실제로는 mp4로 소리가 나옵니다)
소리와 맞는가 하면, 맞는 것처럼 보이지만 미묘한 곳입니다. 역시이 손의 논문은 실제로 움직여 보는 것에 한합니다.
Reference
이 문제에 관하여(Dancing To Music을 움직여 보았습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/tatefuku_hiroshi/items/ea5035361d569a259a03
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
## Dataset
### Content
#### 3 zip files containing data of three dancing categories: Zumba, ballet, and hiphop.
####1 zip files containing data statistics and data path lists for trainint usage.
URL=http://vllab.ucmerced.edu/hylee/Dancing2Music/ballet.zip
wget -N $URL -O ./ballet.zip
unzip ./ballet.zip -d .
rm ./ballet.zip
...(以下略)
(途中略)
./data.zip 100%[=========================================================================================================================================================================>] 1.33M 541KB/s in 2.5s
2020-12-06 15:06:55 (541 KB/s) - './data.zip' saved [1394787/1394787]
Archive: ./data.zip
inflating: ./stats/all_aud_mean.npy
inflating: ./stats/all_aud_std.npy
inflating: ./stats/all_onbeat_mean.npy
inflating: ./stats/all_onbeat_std.npy
inflating: ./stats/aud_3cls.ckpt
inflating: ./unitList/ballet_unitseq3.txt
inflating: ./unitList/ballet_unitseq4.txt
inflating: ./unitList/ballet_unit.txt
inflating: ./unitList/hiphop_unitseq3.txt
inflating: ./unitList/hiphop_unitseq4.txt
inflating: ./unitList/hiphop_unit.txt
inflating: ./unitList/zumba_unitseq3.txt
inflating: ./unitList/zumba_unitseq4.txt
inflating: ./unitList/zumba_unit.txt
--2020-12-06 15:06:55-- http://vllab.ucmerced.edu/hylee/Dancing2Music/Stage1.ckpt
Resolving vllab.ucmerced.edu (vllab.ucmerced.edu)... 169.236.184.69
Connecting to vllab.ucmerced.edu (vllab.ucmerced.edu)|169.236.184.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185511583 (177M) [text/plain]
Saving to: 'Stage1.ckpt'
(以下略)
여기에서 git clone 한 소스에서는 파일이 부족하기 때문에 움직이지 않습니다!
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c
저자가 자신의 페이지에 게시하는 것을 다운로드합니다 (이것은 끔찍합니다)
htp : // v ぁ b. 우메레세 d. 에즈/hye에/단신g2무시c/에서도. 지 p
demo 폴더 안에 checkpoint 폴더를 만들어 다운로드한 체크포인트의 파일을 넣습니다.
%mkdir demo/checkpoint
%cp Stage1.ckpt demo/checkpoint
%cp Stage2.ckpt demo/checkpoint
demo.py를 실행합니다. aud_path에 입력 오디오 파일을 지정합니다. --out_file에 출력할 댄스의 동영상 파일 이름을 지정합니다. 두 번째 체크 포인트는 --resume으로 지정합니다 (github 문서와 다르므로주의)
%Dancing2Music/demo# cat demo.sh
python demo.py --decomp_snapshot checkpoint/Stage1.ckpt --resume checkpoint/Stage2.ckpt --aud_path demo/ChillingMusic.wav --out_file demo/output.mp4 --out_dir demo/out_frame
잘하면 아래와 같은 표시 후에 output.mp4가 출력됩니다.
%sh demo.sh
2020-12-14 12:12:08.409682: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-14 12:12:08.409714: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'demo/ChillingMusic-formatted.wav':
Metadata:
ISFT : Lavf57.83.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc57.107.100 pcm_s16le
size= 1180kB time=00:00:27.40 bitrate= 352.8kbits/s speed=1.26e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.006453%
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
Loading Done
process 0/5
process 1/5
process 2/5
process 3/5
process 4/5
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Input #0, image2, from 'demo_output/frame%03d.png':
Duration: 00:00:19.20, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 500x256, 25 fps, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'demo/ChillingMusic.wav':
Duration: 00:00:27.41, bitrate: 1411 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[image2 @ 0x561b882ddb20] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[libx264 @ 0x561b883bb060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x561b883bb060] profile High, level 2.1
[libx264 @ 0x561b883bb060] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'demo/output.mp4':
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 500x256, q=-1--1, 30 fps, 15360 tbn, 30 tbc
Metadata:
encoder : Lavc57.107.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc57.107.100 aac
frame= 959 fps=671 q=-1.0 Lsize= 747kB time=00:00:31.86 bitrate= 192.0kbits/s dup=479 drop=0 speed=22.3x
video:277kB audio:438kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.482303%
[libx264 @ 0x561b883bb060] frame I:4 Avg QP: 6.57 size: 2596
[libx264 @ 0x561b883bb060] frame P:251 Avg QP:22.27 size: 736
[libx264 @ 0x561b883bb060] frame B:704 Avg QP:18.87 size: 125
[libx264 @ 0x561b883bb060] consecutive B-frames: 0.5% 4.6% 0.6% 94.3%
[libx264 @ 0x561b883bb060] mb I I16..4: 90.4% 1.6% 8.1%
[libx264 @ 0x561b883bb060] mb P I16..4: 0.4% 0.9% 0.2% P16..4: 3.2% 3.0% 2.5% 0.0% 0.0% skip:89.9%
[libx264 @ 0x561b883bb060] mb B I16..4: 0.1% 0.0% 0.0% B16..8: 5.5% 0.9% 0.3% direct: 0.1% skip:93.0% L0:28.4% L1:69.4% BI: 2.2%
[libx264 @ 0x561b883bb060] 8x8 transform intra:28.0% inter:6.3%
[libx264 @ 0x561b883bb060] coded y,uvDC,uvAC intra: 4.4% 16.2% 13.6% inter: 0.7% 2.2% 2.0%
[libx264 @ 0x561b883bb060] i16 v,h,dc,p: 89% 5% 6% 0%
[libx264 @ 0x561b883bb060] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 2% 94% 0% 0% 0% 0% 0% 0%
[libx264 @ 0x561b883bb060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 11% 41% 3% 2% 4% 2% 5% 1%
[libx264 @ 0x561b883bb060] i8c dc,h,v,p: 71% 9% 19% 1%
[libx264 @ 0x561b883bb060] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x561b883bb060] ref P L0: 58.2% 3.0% 20.7% 18.1%
[libx264 @ 0x561b883bb060] ref B L0: 69.6% 20.0% 10.4%
[libx264 @ 0x561b883bb060] ref B L1: 97.4% 2.6%
[libx264 @ 0x561b883bb060] kb/s:70.94
[aac @ 0x561b883bea60] Qavg: 397.796
실행 결과
이런 느낌의 동영상이 출력됩니다(Qiita의 사정으로 gif로 하고 있습니다만 실제로는 mp4로 소리가 나옵니다)
소리와 맞는가 하면, 맞는 것처럼 보이지만 미묘한 곳입니다. 역시이 손의 논문은 실제로 움직여 보는 것에 한합니다.
Reference
이 문제에 관하여(Dancing To Music을 움직여 보았습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/tatefuku_hiroshi/items/ea5035361d569a259a03
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
Reference
이 문제에 관하여(Dancing To Music을 움직여 보았습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/tatefuku_hiroshi/items/ea5035361d569a259a03텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)