Prometheus ๊ฒฝ๊ณ ์ค์
7806 ๋จ์ด AlertManagercentos7prometheus
ํ๊ฒฝ
Prometheus๋ Docker ์ปจํ ์ด๋์์ ์์ง์ ๋๋ค.
ํด๋ผ์ฐ๋ ํ๊ฒฝ: Azure
Docker ํธ์คํธ: CentOS7.3
Docker ์ปจํ ์ด๋: (prometheus ์๋ฒ) CentOS7.3
<๊ฐ์ ๋์>
Docker ํธ์คํธ: CentOS7.3
Docker ์ปจํ ์ด๋ : CentOS7.3 (์น ์๋ฒ๋ฅผ ๊ฐ์ ํ์ฌ Apache ์์)
์ ์ ์กฐ๊ฑด
ยท Prometheus ์๋ฒ ์ค์น๊ฐ ์๋ฃ๋์์
Prometheus๋ฅผ CentOS7.3 & Docker์ ์ค์นํด ๋ณด์์ต๋๋ค.
AlertManager ์ค์น
1. AlertMananager์ URL ๋ณต์ฌ
Prometheus ๊ณต์ ์ฌ์ดํธ์์ AlertManager๋ฅผ ๋ค์ด๋ก๋ํฉ๋๋ค.
์ด ํ๊ฒฝ์์๋ ๋ค์์ ์ ํํฉ๋๋ค.
Operating system: linux
Architecture: amd64
alertmanager๋ฅผ ์ฐพ์ ๋งํฌ์ ์ฃผ์๋ฅผ ๋ณต์ฌํฉ๋๋ค.
2. ๋ค์ด๋ก๋
<Promethusใตใผใ>
## cd /usr/local/src
## wget https://github.com/prometheus/alertmanager/releases/download/v0.5.1/alertmanager-0.5.1.linux-amd64.tar.gz
## tar xfvz alertmanager-0.5.1.linux-amd64.tar.gz
## cd alertmanager-0.5.1.linux-amd64/
## cp -p alertmanager /usr/bin/.
3. ์ค์ ํ์ผ ๋ฐฐ์น
<Promethusใตใผใ>
## cd /etc/prometheus
## wget https://raw.githubusercontent.com/alerta/prometheus-config/master/alertmanager.yml
(Default็ถๆ
)
## cat /etc/prometheus/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'alertmanager@example.org'
route:
receiver: "alerta"
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receivers:
- name: "alerta"
webhook_configs:
- url: 'http://localhost:8080/webhooks/prometheus'
send_resolved: true
4. ์๋ ์์ ์ค์
AlertManager๋ ํ์คํ ์๋ ์์ํ๋๋ก ํฉ์๋ค.
<Promethusใตใผใ>
## vi /etc/default/alertmanager
OPTIONS="-config.file /etc/prometheus/alertmanager.yml"
## vi /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Prometheus alertmanager Service
After=syslog.target.prometheus.alertmanager.service
[Service]
Type=simple
EnvironmentFile=-/etc/default/alertmanager
ExecStart=/usr/bin/alertmanager $OPTIONS
PrivateTmp=true
[Install]
WantedBy=multi-user.target
## systemctl enable alertmanager.service
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /usr/lib/systemd/system/alertmanager.service.
## systemctl start alertmanager
5. ์๋ฆผ ์ค์ ์ ์ค๋น(๋ฉ์ผ ์ค์ )
๋ฉ์ผ ์ก์ ์ ๊ตฌ์กฐ๋, ํ๊ฒฝ์ ๋ง์ถ์ด ์ค์ํฉ์๋ค.
์ด๋ฒ์๋, Azure์ VM์์์ ํ๊ฒฝ์ ์กฐ๋ฆฝํ๊ณ ์๋ ์ผ๋ ์์ด, ์ด์ชฝ์ ์ฐธ๊ณ ๋ก ๋ฉ์ผ ์ก์ ์ ๊ธฐ๋ฅ์ ๊ฐ์ถ๊ณ ์์ต๋๋ค.
Azure ๋ฉ์ผ ์ ์ก์ SendGrid
6. ๊ฒฝ๊ณ ์ค์
ใ3. ์ค์ ํ์ผ์ ๋ฐฐ์นใ์ config ํ์ผ์ ํธ์งํฉ์๋ค.
์ด๋ฒ์๋ ๋ฉ์ผ ์๋ฆผ ์ค์ ์ ๋ฃ์ต๋๋ค. ๊ฐ์ Default ๊ฐ์์ ๋ณ๊ฒฝ๋ฉ๋๋ค.
<Promethusใตใผใ>
## cat alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'smtp.sendgrid.net:25' โ
SendGrid ใฎSMTPๆฅ็ถๅ
smtp_from: '****************@******' โ
SendGrid ็ป้ฒใกใผใซใขใใฌใน
smtp_auth_username: '****@azure.com' โ
SendGrid ใงๆใๅบใใใUserName
smtp_auth_password: '*******' โ
SendGrid ใง่จญๅฎใใใในใฏใผใ๏ผๅนณๆใง่จ่ผใใใฎใฏใกใใฃใจใญ๏ผ
smtp_auth_secret: '*********' โ
SendGrid ใงๆใๅบใใใAPIใญใผ
route:
receiver: "mail"
group_by: ['alertname', 'instance', 'severity'] โ
ๅไธใขใฉใผใๅใๅไธใคใณในใฟใณในใๅไธใตใผใในใฎใขใฉใผใใซๅฏพใใฆ
group_wait: 30s โ
30็งไปฅๅ
ใฎใขใฉใผใใฏๅไธใขใฉใผใใจ่ฆใชใ
group_interval: 10m โ
10ๅๆฏใซ้็ฅ
repeat_interval: 1h โ
ไธๅบฆ้็ฅใใใขใฉใผใใฏ 1ๆ้ๅพใซ้็ฅ
# receiver: "slack-notifications"
# group_by: ['alertname', 'instance']
receivers:
- name: 'mail'
email_configs:
- to: *****@********,####@###### โ
ใขใฉใผใ้ไฟกๅ
ใฎใขใใฌใน๏ผ่คๆฐใใใจใใฏใ, ใซใณใๅบๅใ๏ผ
โ
ใใฏใ้ ๅผตใฃใใใฉใงใใชใใใใ
โ
toใๅใใใใจใใฏใ-to: ใๅใใใใซ่จ่ผใใใฐOK
inhibit_rules:
- source_match:
severity: 'critical' โ
ใขใฉใผใใฎๆทฑๅปๅบฆ(severity) ใ critical ใฎๅ ดๅใ
target_match: โ
ๅไธใฎใขใฉใผใๅใง warning ใฎใใฎใฏ้็ฅใใชใใ
severity: 'warning'
equal: ['alertname']
7. ๊ท์น ์ค์
๊ท์น ์ค์ ์ ์ง์ ํ์ํ ๊ท์น์ ๊ณ ๋ คํด๋ณด์ญ์์ค.
<Promethusใตใผใ>
## cat /etc/prometheus/alert.rules
ALERT instance_down
IF up == 0
FOR 2m
LABELS { severity = "critical" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes.",
}
ALERT cpu_threshold_exceeded
IF (100 * (1 - avg by(instance)(irate(node_cpu{job='node',mode='idle'}[5m])))) > THRESHOLD_CPU
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} CPU usage is dangerously high",
description = "This device's cpu usage has exceeded the threshold with a value of {{ $value }}.",
}
ALERT mem_threshold_exceeded
IF (node_memory_MemFree{job='node'} + node_memory_Cached{job='node'} + node_memory_Buffers{job='node'})/1000000 < THRESHOLD_MEM
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} memory usage is dangerously high",
description = "This device's memory usage has exceeded the threshold with a value of {{ $value }}.",
}
ALERT filesystem_threshold_exceeded
IF node_filesystem_avail{job='node',mountpoint='/'} / node_filesystem_size{job='node'} * 100 < THRESHOLD_FS
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} filesystem usage is dangerously high",
description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.",
}
ALERT node_high_loadaverage
IF rate(node_load1[1m]) > 2
FOR 10s
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "High load average on {{$labels.instance}}",
description = "{{$labels.instance}} has a high load average above 10s (current value: {{$value}})"
}
8. prometheus์ ๋ด์ฅ
Prometheus์ Alertmanager๋ฅผ ํตํฉํ์ญ์์ค.
/etc/prometheus/prometheus.yml์ ๋์ ์ถ๊ฐํฉ๋๋ค.
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets: ['<ใในใๅ>.japaneast.cloudapp.azure.com:9093']
9. ๋ง์ง๋ง์ผ๋ก
์ค์ ํ์ผ์ ๊ธฐ์ฌ๊ฐ ์ฌ๋ฐ๋ฅธ์ง ์ ๋๋ก ํ์ธํฉ์๋ค.
<Promethusใตใผใ>
## promtool check-config /etc/prometheus/prometheus.yml
## promtool check-config /etc/prometheus/alertmanager.yml
alertmanager, Prometheus๋ฅผ ๋ค์ ์์ํ์ฌ ์๋ฃ๋์์ต๋๋ค.
<Promethusใตใผใ>
## systemctl restart alertmanager
## systemctl restart prometheus
10. ๋์ ํ์ธ
์ ๋นํ ๊ฐ์ ๋์์ ์๋ฒ๋ฅผ ์ ์งํด ๋ด ์๋ค.
๋ฉ์ผ์ด ๋ ์ ์์ต๋๋ค.

์ฐธ๊ณ ์ฌ์ดํธ
Tech-Sketch
Prometheus ํ๊ฒฝ ๊ตฌ์ถ ์ ์ฐจ
Azure ๋ฉ์ผ ์ ์ก์ SendGrid
Reference
์ด ๋ฌธ์ ์ ๊ดํ์ฌ(Prometheus ๊ฒฝ๊ณ ์ค์ ), ์ฐ๋ฆฌ๋ ์ด๊ณณ์์ ๋ ๋ง์ ์๋ฃ๋ฅผ ๋ฐ๊ฒฌํ๊ณ ๋งํฌ๋ฅผ ํด๋ฆญํ์ฌ ๋ณด์๋ค https://qiita.com/miwato/items/b26d82fdf5324936ed8bํ ์คํธ๋ฅผ ์์ ๋กญ๊ฒ ๊ณต์ ํ๊ฑฐ๋ ๋ณต์ฌํ ์ ์์ต๋๋ค.ํ์ง๋ง ์ด ๋ฌธ์์ URL์ ์ฐธ์กฐ URL๋ก ๋จ๊ฒจ ๋์ญ์์ค.
์ฐ์ํ ๊ฐ๋ฐ์ ์ฝํ
์ธ ๋ฐ๊ฒฌ์ ์ ๋
(Collection and Share based on the CC Protocol.)
์ข์ ์นํ์ด์ง ์ฆ๊ฒจ์ฐพ๊ธฐ
๊ฐ๋ฐ์ ์ฐ์ ์ฌ์ดํธ ์์ง
๊ฐ๋ฐ์๊ฐ ์์์ผ ํ ํ์ ์ฌ์ดํธ 100์ ์ถ์ฒ ์ฐ๋ฆฌ๋ ๋น์ ์ ์ํด 100๊ฐ์ ์์ฃผ ์ฌ์ฉํ๋ ๊ฐ๋ฐ์ ํ์ต ์ฌ์ดํธ๋ฅผ ์ ๋ฆฌํ์ต๋๋ค