Prometheus 경고 설정

7806 단어 AlertManager centos7 prometheus

환경

Prometheus는 Docker 컨테이너에서 움직입니다.
클라우드 환경: Azure
Docker 호스트: CentOS7.3
Docker 컨테이너: (prometheus 서버) CentOS7.3

<감시 대상>
Docker 호스트: CentOS7.3
Docker 컨테이너 : CentOS7.3 (웹 서버를 가정하여 Apache 시작)

전제 조건

· Prometheus 서버 설치가 완료되었음
Prometheus를 CentOS7.3 & Docker에 설치해 보았습니다.

AlertManager 설치

1. AlertMananager의 URL 복사

Prometheus 공식 사이트에서 AlertManager를 다운로드합니다.

이 환경에서는 다음을 선택합니다.
Operating system: linux
Architecture: amd64

alertmanager를 찾아 링크의 주소를 복사합니다.

2. 다운로드

<Promethusサーバ>
## cd /usr/local/src
## wget https://github.com/prometheus/alertmanager/releases/download/v0.5.1/alertmanager-0.5.1.linux-amd64.tar.gz
## tar xfvz alertmanager-0.5.1.linux-amd64.tar.gz
## cd alertmanager-0.5.1.linux-amd64/
## cp -p alertmanager /usr/bin/.

3. 설정 파일 배치

<Promethusサーバ>
## cd /etc/prometheus
## wget https://raw.githubusercontent.com/alerta/prometheus-config/master/alertmanager.yml
(Default状態)
## cat /etc/prometheus/alertmanager.yml
global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'                  
  smtp_from: '[email protected]'           

route:
  receiver: "alerta"
  group_by: ['alertname']
  group_wait:      30s
  group_interval:  5m
  repeat_interval: 2h

receivers:
- name: "alerta"
  webhook_configs:
  - url: 'http://localhost:8080/webhooks/prometheus'
    send_resolved: true

4. 자동 시작 설정
AlertManager도 확실히 자동 시작하도록 합시다.

<Promethusサーバ>
## vi /etc/default/alertmanager
OPTIONS="-config.file /etc/prometheus/alertmanager.yml"

## vi /usr/lib/systemd/system/alertmanager.service

[Unit]
Description=Prometheus alertmanager Service
After=syslog.target.prometheus.alertmanager.service

[Service]
Type=simple
EnvironmentFile=-/etc/default/alertmanager
ExecStart=/usr/bin/alertmanager $OPTIONS
PrivateTmp=true

[Install]
WantedBy=multi-user.target


## systemctl enable alertmanager.service
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /usr/lib/systemd/system/alertmanager.service.
## systemctl start alertmanager

5. 알림 설정 전 준비(메일 설정)

메일 송신의 구조는, 환경에 맞추어 실시합시다.
이번에는, Azure의 VM상에서 환경을 조립하고 있는 일도 있어, 이쪽을 참고로 메일 송신의 기능을 갖추고 있습니다.
Azure 메일 전송은 SendGrid

6. 경고 설정

「3. 설정 파일의 배치」의 config 파일을 편집합시다.
이번에는 메일 알림 설정을 넣습니다. 값은 Default 값에서 변경됩니다.

<Promethusサーバ>
## cat alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'smtp.sendgrid.net:25'    ★ SendGrid のSMTP接続先
  smtp_from: '****************@******'      ★ SendGrid 登録メールアドレス
  smtp_auth_username: '****@azure.com'      ★ SendGrid で払い出されたUserName
  smtp_auth_password: '*******'             ★ SendGrid で設定したパスワード（平文で記載するのはちょっとね）
  smtp_auth_secret: '*********'             ★ SendGrid で払い出されたAPIキー

route:
  receiver: "mail"
  group_by: ['alertname', 'instance', 'severity']   ★ 同一アラート名、同一インスタンス、同一サービスのアラートに対して
  group_wait: 30s                                   ★ 30秒以内のアラートは同一アラートと見なす
  group_interval: 10m                               ★ 10分毎に通知
  repeat_interval: 1h                               ★ 一度通知したアラートは 1時間後に通知

#  receiver: "slack-notifications"
#  group_by: ['alertname', 'instance']

receivers:
 - name: 'mail'
   email_configs:
   - to: *****@********,####@######        ★ アラート送信先のアドレス（複数あるときは、, カンマ区切り）
                                           ★ ㏄は、頑張ったけどできない。。。
                                           ★ toを分けたいときは、-to: を同じように記載すればOK

inhibit_rules:
 - source_match:
     severity: 'critical'                  ★ アラートの深刻度(severity) が critical の場合、
   target_match:                           ★ 同一のアラート名で warning のものは通知しない。
     severity: 'warning'
   equal: ['alertname']

7. 규칙 설정

규칙 설정은 직접 필요한 규칙을 고려해보십시오.

<Promethusサーバ>
## cat /etc/prometheus/alert.rules
ALERT instance_down
  IF up == 0
  FOR 2m
  LABELS { severity = "critical" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} down",
    description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes.",
  }

ALERT cpu_threshold_exceeded
  IF (100 * (1 - avg by(instance)(irate(node_cpu{job='node',mode='idle'}[5m])))) > THRESHOLD_CPU
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} CPU usage is dangerously high",
    description = "This device's cpu usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT mem_threshold_exceeded
  IF (node_memory_MemFree{job='node'} + node_memory_Cached{job='node'} + node_memory_Buffers{job='node'})/1000000 < THRESHOLD_MEM
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} memory usage is dangerously high",
    description = "This device's memory usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT filesystem_threshold_exceeded
  IF node_filesystem_avail{job='node',mountpoint='/'} / node_filesystem_size{job='node'} * 100 < THRESHOLD_FS
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} filesystem usage is dangerously high",
    description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT node_high_loadaverage
  IF rate(node_load1[1m]) > 2
  FOR 10s
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "High load average on {{$labels.instance}}",
    description = "{{$labels.instance}} has a high load average above 10s (current value: {{$value}})"
  }

8. prometheus에 내장

Prometheus에 Alertmanager를 통합하십시오.
/etc/prometheus/prometheus.yml의 끝에 추가합니다.

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets: ['<ホスト名>.japaneast.cloudapp.azure.com:9093']

9. 마지막으로

설정 파일의 기재가 올바른지 제대로 확인합시다.

<Promethusサーバ>
## promtool check-config /etc/prometheus/prometheus.yml
## promtool check-config /etc/prometheus/alertmanager.yml

alertmanager, Prometheus를 다시 시작하여 완료되었습니다.

<Promethusサーバ>
## systemctl restart alertmanager
## systemctl restart prometheus

10. 동작 확인

적당히 감시 대상의 서버를 정지해 봅시다.
메일이 날 수 있습니다.

참고 사이트

Tech-Sketch
Prometheus 환경 구축 절차
Azure 메일 전송은 SendGrid

Reference

이 문제에 관하여(Prometheus 경고 설정), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/miwato/items/b26d82fdf5324936ed8b

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

항상 동일한 버전의 Prometheus를 사용하는 방법 (이미지 버전을 변경하고 싶지 않음)

PromDash로 서버 메트릭 대시보드 만들기

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다