[FB6657] 재부팅 실패를 방지하기 위해 ephemeraldisk 설계도 업데이트

묘사

패치에 대한 설명
다시 시작할 때 시작을 거부하거나 유지보수 모드로 시작하는 실례를 복구했습니다.
권장 릴리즈 노트
임시 디스크의 fstab 형식 수정
위험을 예측하다
중간 수준의 위험 - 임시 디스크가 설치된 경우 나중에 수동으로 재부팅하지 않고도 안전하게 재부팅할 수 없음
관련된 구성 요소
우담화 일현
의존 관계
우담화 일현
테스트 완료 설명
  • 임시 디스크가 설치된 인스턴스 재부팅
  • 품질 보증 설명
  • 임시 디스크
  • 부팅 인스턴스
  • SSH - 인스턴스
  • 설치 보장 df -h
  • fstab 검사cat /etc/fstab
  • nofail
  • 이 설치되어 있는지 확인
  • 재부팅 인스턴스
  • 인스턴스가 제대로 재부팅되었는지 확인
  • 토론 #1

    ioannis_v6에 컴퓨팅 최적화(C5d) 대형 스크린을 사용하고 변경 사항과 일치하도록 편집ephemeraldisk/libraries/ephemeral_helpers.rb합니다. 중첩되어 문제를 해결할 수 없기 때문입니다.
  • chef run에서 sshd를 다시 시작할 수 없음:
  • 
    ================================================================================
    Error executing action `restart` on resource 'service[sshd]'
    ================================================================================
    
    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    Expected process to exit with [0], but received '1'
    ---- Begin output of /bin/systemctl --system restart sshd ----
    STDOUT: 
    STDERR: Job for ssh.service canceled.
    ---- End output of /bin/systemctl --system restart sshd ----
    Ran /bin/systemctl --system restart sshd returned 1
    
    Resource Declaration:
    ---------------------
    # In /etc/chef/recipes/cookbooks/ey-core/recipes/sshd.rb
    
     17: service "sshd" do
     18:   action :restart
     19: end
    
    Compiled Resource:
    ------------------
    # Declared in /etc/chef/recipes/cookbooks/ey-core/recipes/sshd.rb:17:in `from_file'
    
    service("sshd") do
      action [:restart]
      supports {:restart=>nil, :reload=>nil, :status=>nil}
      retries 0
      retry_delay 2
      default_guard_interpreter :default
      service_name "sshd"
      enabled nil
      running nil
      masked nil
      pattern "sshd"
      declared_type :service
      cookbook_name "ey-core"
      recipe_name "sshd"
    end
    
    System Info:
    ------------
    chef_version=12.22.5
    platform=ubuntu
    platform_version=18.04
    ruby=ruby 2.3.6p384 (2017-12-14 revision 61254) [x86_64-linux]
    program_name=chef-solo worker: ppid=3322;start=08:45:02;
    executable=/opt/chef/embedded/bin/chef-solo
    
    root@ip-10-0-5-254:~# /etc/init.d/ssh status
    ● ssh.service - OpenBSD Secure Shell server
       Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Tue 2021-05-18 08:45:07 UTC; 5min ago
      Process: 1030 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=0/SUCCESS)
     Main PID: 1030 (code=exited, status=0/SUCCESS)
        Tasks: 4 (limit: 4451)
       CGroup: /system.slice/ssh.service
               ├─2207 sshd: root@pts/0
               ├─2275 -bash
               ├─4198 /bin/sh /etc/init.d/ssh status
               └─4207 /bin/systemctl --no-pager --job-mode=ignore-dependencies status ssh.service
    
    May 18 08:44:59 ip-10-0-5-254 sshd[3201]: pam_unix(sshd:session): session opened for user root by (uid=0)
    May 18 08:44:59 ip-10-0-5-254 sshd[3201]: pam_systemd(sshd:session): Failed to connect to system bus: No such file or directory
    May 18 08:44:59 ip-10-0-5-254 sshd[3201]: Starting session: command for root from 3.92.26.95 port 2633 id 0
    May 18 08:45:07 ip-10-0-5-254 systemd[1]: Stopping OpenBSD Secure Shell server...
    May 18 08:45:07 ip-10-0-5-254 systemd[1]: Stopped OpenBSD Secure Shell server.
    May 18 08:46:38 ip-10-0-5-254 sshd[3201]: Connection closed by 3.92.26.95 port 2633
    May 18 08:46:38 ip-10-0-5-254 sshd[3201]: Close session: user root from 3.92.26.95 port 2633 id 0
    May 18 08:46:38 ip-10-0-5-254 sshd[3201]: pam_unix(sshd:session): session closed for user root
    May 18 08:46:38 ip-10-0-5-254 sshd[3201]: Transferred: sent 3088, received 6864 bytes
    May 18 08:46:38 ip-10-0-5-254 sshd[3201]: Closing connection to 3.92.26.95 port 2633
    
  • 시스템 로그ng 실패:
  • root@ip-10-0-5-254:~# /etc/init.d/syslog-ng status
    ● syslog-ng.service - System Logger Daemon
       Loaded: loaded (/lib/systemd/system/syslog-ng.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Tue 2021-05-18 08:46:37 UTC; 4min 35s ago
         Docs: man:syslog-ng(8)
      Process: 2848 ExecStart=/usr/sbin/syslog-ng -F $SYSLOGNG_OPTS (code=exited, status=0/SUCCESS)
     Main PID: 2848 (code=exited, status=0/SUCCESS)
       Status: "Shutting down... (Tue May 18 08:46:37 2021"
    
    May 18 08:44:37 ip-10-0-5-254 systemd[1]: Starting System Logger Daemon...
    May 18 08:44:38 ip-10-0-5-254 systemd[1]: Started System Logger Daemon.
    May 18 08:46:37 ip-10-0-5-254 systemd[1]: Stopping System Logger Daemon...
    May 18 08:46:37 ip-10-0-5-254 systemd[1]: Stopped System Logger Daemon.
    
  • ntp 실패:
  • root@ip-10-0-5-254:~# /etc/init.d/ntp status
    ● ntp.service - Network Time Service
       Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Tue 2021-05-18 08:46:37 UTC; 26min ago
         Docs: man:ntpd(8)
      Process: 2763 ExecStart=/usr/lib/ntp/ntp-systemd-wrapper (code=exited, status=0/SUCCESS)
     Main PID: 2783 (code=exited, status=0/SUCCESS)
    
    May 18 08:44:11 ip-10-0-5-254 systemd[1]: Starting Network Time Service...
    May 18 08:44:12 ip-10-0-5-254 systemd[1]: Started Network Time Service.
    May 18 08:46:37 ip-10-0-5-254 systemd[1]: Stopping Network Time Service...
    May 18 08:46:37 ip-10-0-5-254 systemd[1]: Stopped Network Time Service.
    
  • 초기 방진 커버 대비 /tmp/eph1 미설치:
  • root@ip-10-0-5-254:~# df -h
    Filesystem      Size  Used Avail Use% Mounted on
    udev            1.9G     0  1.9G   0% /dev
    tmpfs           374M  888K  373M   1% /run
    /dev/nvme3n1p1   30G  8.2G   21G  29% /
    tmpfs           1.9G   28K  1.9G   1% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
    /dev/loop1       90M   90M     0 100% /snap/core/8039
    /dev/loop0       18M   18M     0 100% /snap/amazon-ssm-agent/1480
    /dev/nvme2n1     25G   97M   24G   1% /mnt
    /dev/loop2       99M   99M     0 100% /snap/core/11081
    /dev/loop3       56M   56M     0 100% /snap/core18/1997
    /dev/loop4       34M   34M     0 100% /snap/amazon-ssm-agent/3552
    /dev/nvme4n1     15G  378M   14G   3% /data
    /dev/nvme0n1     46G   53M   44G   1% /tmp/eph1
    /dev/nvme5n1     15G  727M   14G   6% /db
    tmpfs           374M     0  374M   0% /run/user/1000
    tmpfs           374M     0  374M   0% /run/user/0
    root@ip-10-0-5-254:~# cd /tmp/eph1
    root@ip-10-0-5-254:/tmp/eph1# ls
    lost+found
    
    root@ip-10-0-5-254:~# df -h
    Filesystem      Size  Used Avail Use% Mounted on
    udev            1.9G     0  1.9G   0% /dev
    tmpfs           374M  712K  373M   1% /run
    /dev/nvme5n1p1   30G  8.2G   21G  29% /
    tmpfs           1.9G     0  1.9G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
    /dev/loop1       56M   56M     0 100% /snap/core18/1997
    /dev/loop0       99M   99M     0 100% /snap/core/11081
    /dev/loop2       90M   90M     0 100% /snap/core/8039
    /dev/loop3       34M   34M     0 100% /snap/amazon-ssm-agent/3552
    /dev/loop4       18M   18M     0 100% /snap/amazon-ssm-agent/1480
    /dev/nvme2n1     15G  743M   14G   6% /db
    /dev/nvme0n1     25G  106M   24G   1% /mnt
    /dev/nvme1n1     15G  378M   14G   3% /data
    root@ip-10-0-5-254:~# cd /tmp/eph1
    -bash: cd: /tmp/eph1: No such file or directory
    

    토론 #2

    @mushyyy의 내부 소통에 따라 실례가 중지되고 재건되면 문제가 해결됩니다.
    root@ip-10-0-3-24:~# df -h
    Filesystem      Size  Used Avail Use% Mounted on
    udev            1.8G     0  1.8G   0% /dev
    tmpfs           371M  888K  370M   1% /run
    /dev/nvme3n1p1   30G  8.2G   21G  29% /
    tmpfs           1.9G   28K  1.9G   1% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
    /dev/loop0       90M   90M     0 100% /snap/core/8039
    /dev/loop1       18M   18M     0 100% /snap/amazon-ssm-agent/1480
    /dev/nvme2n1     25G   97M   24G   1% /mnt
    /dev/loop2       99M   99M     0 100% /snap/core/11081
    /dev/loop3       56M   56M     0 100% /snap/core18/2066
    /dev/loop4       34M   34M     0 100% /snap/amazon-ssm-agent/3552
    /dev/nvme4n1     15G  341M   14G   3% /data
    /dev/nvme0n1     46G   53M   44G   1% /tmp/eph1
    /dev/nvme5n1     15G   87M   14G   1% /db
    tmpfs           371M     0  371M   0% /run/user/1000
    tmpfs           371M     0  371M   0% /run/user/0
    
    고장 사례는 교체해야 하거나, 새로운 사례는 예상대로 작동할 것입니다.

    토론 #셋

    평론에 의하면, 이것은 확실히 새로운 실례의 문제를 해결했습니다. 고장난 사례는 교체해야 합니다. (중지 및 재건)

    좋은 웹페이지 즐겨찾기