Dancing To Music을 움직여 보았습니다.

개요



2020년에 발표된 논문 Dancing To Music의 소스 코드를 시험해 보았습니다.
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c

모듈 버전이 전혀 공개되지 않았기 때문에 작동하는 조합을 조사하는 데 많은 시간이 걸렸습니다. 버전을 명시한 requirement.txt도 동시에 공개해 주었으면 합니다. 이번에는 데모를 움직이는 곳까지 동작 확인할 수 있었습니다. 학습까지는 시도하지 않았습니다.

환경 준비



requirement.txt는 이런 식으로. 나는 로컬 우분투 18.04 환경 (GTX1080Ti)의 도커에서 실행 중이다. 다음 모듈 외에도 파이썬은 3.6.3에서 데모 동작을 확인할 수있었습니다. Python3.6.0에서는 움직이지 않으므로주의.
numpy==1.18.5
matplotlib
torch==1.7.0
torchvision==0.8.1
librosa==0.8.0
jupyter==1.0.0
opencv-python==4.4.0
tensorflow==2.3.1

ffmpeg도 필요합니다.
apt install ffmpeg

데이터 및 모델 다운로드



꽤 이해하기 어렵지만 readme.md Project에 링크가 있습니다.
htp://vbb. 우메레세 d. 에즈/hyee/단신g2무시c/sc리pt. txt
## Dataset
### Content
#### 3 zip files containing data of three dancing categories: Zumba, ballet, and hiphop.
####1 zip files containing data statistics and data path lists for trainint usage.

URL=http://vllab.ucmerced.edu/hylee/Dancing2Music/ballet.zip
wget -N $URL -O ./ballet.zip
unzip ./ballet.zip -d .
rm ./ballet.zip

...(以下略)


여기를 쉘에서 실행하면 다운로드가 시작됩니다.
(途中略)
./data.zip                                                            100%[=========================================================================================================================================================================>]   1.33M   541KB/s    in 2.5s

2020-12-06 15:06:55 (541 KB/s) - './data.zip' saved [1394787/1394787]

Archive:  ./data.zip
  inflating: ./stats/all_aud_mean.npy
  inflating: ./stats/all_aud_std.npy
  inflating: ./stats/all_onbeat_mean.npy
  inflating: ./stats/all_onbeat_std.npy
  inflating: ./stats/aud_3cls.ckpt
  inflating: ./unitList/ballet_unitseq3.txt
  inflating: ./unitList/ballet_unitseq4.txt
  inflating: ./unitList/ballet_unit.txt
  inflating: ./unitList/hiphop_unitseq3.txt
  inflating: ./unitList/hiphop_unitseq4.txt
  inflating: ./unitList/hiphop_unit.txt
  inflating: ./unitList/zumba_unitseq3.txt
  inflating: ./unitList/zumba_unitseq4.txt
  inflating: ./unitList/zumba_unit.txt
--2020-12-06 15:06:55--  http://vllab.ucmerced.edu/hylee/Dancing2Music/Stage1.ckpt
Resolving vllab.ucmerced.edu (vllab.ucmerced.edu)... 169.236.184.69
Connecting to vllab.ucmerced.edu (vllab.ucmerced.edu)|169.236.184.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185511583 (177M) [text/plain]
Saving to: 'Stage1.ckpt'

(以下略)


소스 코드 다운로드 및 실행



여기에서 git clone 한 소스에서는 파일이 부족하기 때문에 움직이지 않습니다!
htps : // 기주 b. 코 m / n V bs / 단신 g2 무시 c

저자가 자신의 페이지에 게시하는 것을 다운로드합니다 (이것은 끔찍합니다)
htp : // v ぁ b. 우메레세 d. 에즈/hye에/단신g2무시c/에서도. 지 p

demo 폴더 안에 checkpoint 폴더를 만들어 다운로드한 체크포인트의 파일을 넣습니다.
%mkdir demo/checkpoint
%cp Stage1.ckpt demo/checkpoint
%cp Stage2.ckpt demo/checkpoint

demo.py를 실행합니다. aud_path에 입력 오디오 파일을 지정합니다. --out_file에 출력할 댄스의 동영상 파일 이름을 지정합니다. 두 번째 체크 포인트는 --resume으로 지정합니다 (github 문서와 다르므로주의)
%Dancing2Music/demo# cat demo.sh
python demo.py --decomp_snapshot checkpoint/Stage1.ckpt --resume checkpoint/Stage2.ckpt --aud_path demo/ChillingMusic.wav --out_file demo/output.mp4 --out_dir demo/out_frame


잘하면 아래와 같은 표시 후에 output.mp4가 출력됩니다.
%sh demo.sh
2020-12-14 12:12:08.409682: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-14 12:12:08.409714: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'demo/ChillingMusic.wav':
  Duration: 00:00:27.41, bitrate: 1411 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'demo/ChillingMusic-formatted.wav':
  Metadata:
    ISFT            : Lavf57.83.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc57.107.100 pcm_s16le
size=    1180kB time=00:00:27.40 bitrate= 352.8kbits/s speed=1.26e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.006453%
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
Loading Done
process 0/5
process 1/5
process 2/5
process 3/5
process 4/5
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, image2, from 'demo_output/frame%03d.png':
  Duration: 00:00:19.20, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: png, rgb24(pc), 500x256, 25 fps, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'demo/ChillingMusic.wav':
  Duration: 00:00:27.41, bitrate: 1411 kb/s
    Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[image2 @ 0x561b882ddb20] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[libx264 @ 0x561b883bb060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x561b883bb060] profile High, level 2.1
[libx264 @ 0x561b883bb060] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'demo/output.mp4':
  Metadata:
    encoder         : Lavf57.83.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 500x256, q=-1--1, 30 fps, 15360 tbn, 30 tbc
    Metadata:
      encoder         : Lavc57.107.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc57.107.100 aac
frame=  959 fps=671 q=-1.0 Lsize=     747kB time=00:00:31.86 bitrate= 192.0kbits/s dup=479 drop=0 speed=22.3x
video:277kB audio:438kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.482303%
[libx264 @ 0x561b883bb060] frame I:4     Avg QP: 6.57  size:  2596
[libx264 @ 0x561b883bb060] frame P:251   Avg QP:22.27  size:   736
[libx264 @ 0x561b883bb060] frame B:704   Avg QP:18.87  size:   125
[libx264 @ 0x561b883bb060] consecutive B-frames:  0.5%  4.6%  0.6% 94.3%
[libx264 @ 0x561b883bb060] mb I  I16..4: 90.4%  1.6%  8.1%
[libx264 @ 0x561b883bb060] mb P  I16..4:  0.4%  0.9%  0.2%  P16..4:  3.2%  3.0%  2.5%  0.0%  0.0%    skip:89.9%
[libx264 @ 0x561b883bb060] mb B  I16..4:  0.1%  0.0%  0.0%  B16..8:  5.5%  0.9%  0.3%  direct: 0.1%  skip:93.0%  L0:28.4% L1:69.4% BI: 2.2%
[libx264 @ 0x561b883bb060] 8x8 transform intra:28.0% inter:6.3%
[libx264 @ 0x561b883bb060] coded y,uvDC,uvAC intra: 4.4% 16.2% 13.6% inter: 0.7% 2.2% 2.0%
[libx264 @ 0x561b883bb060] i16 v,h,dc,p: 89%  5%  6%  0%
[libx264 @ 0x561b883bb060] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu:  4%  2% 94%  0%  0%  0%  0%  0%  0%
[libx264 @ 0x561b883bb060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 11% 41%  3%  2%  4%  2%  5%  1%
[libx264 @ 0x561b883bb060] i8c dc,h,v,p: 71%  9% 19%  1%
[libx264 @ 0x561b883bb060] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x561b883bb060] ref P L0: 58.2%  3.0% 20.7% 18.1%
[libx264 @ 0x561b883bb060] ref B L0: 69.6% 20.0% 10.4%
[libx264 @ 0x561b883bb060] ref B L1: 97.4%  2.6%
[libx264 @ 0x561b883bb060] kb/s:70.94
[aac @ 0x561b883bea60] Qavg: 397.796


실행 결과



이런 느낌의 동영상이 출력됩니다(Qiita의 사정으로 gif로 하고 있습니다만 실제로는 mp4로 소리가 나옵니다)


소리와 맞는가 하면, 맞는 것처럼 보이지만 미묘한 곳입니다. 역시이 손의 논문은 실제로 움직여 보는 것에 한합니다.

좋은 웹페이지 즐겨찾기