prometheus에 의한 kubernetes의 클러스터 모니터링

prometheus에서 kubernetes의 클러스터를 모니터링하고 싶습니다!

첫 투고입니다.

kubernetes의 클러스터 감시라고 하면, prometheus, EFK, SaaS에서는 Datadog등이 있습니다만, 이번은 prometheus로 감시를 실시합니다.

kubernetes용 prometheus 환경 구축

htps : // 기주 b. 코 m / 기안 t su rm / 쿠베 r
보다 prometheus 환경을 구축합니다. 최종 갱신이 9월로, Deployment도 beta이기도 합니다만, 이번은 신경쓰지 않고 이용합니다.

Quickstart에 있는 명령을 실행하면 monitoring이라는 namespace에 prometheus/grafana/node-exporter 등이 자동으로 빌드됩니다.

kubectl apply \
  --filename https://raw.githubusercontent.com/giantswarm/kubernetes-prometheus/master/manifests-all.yaml

구축 후는 monitoring의 namespace에서 pod가 여러가지 일어나고 있는 것을 확인할 수 있습니다.

# kubectl get pods --namespace=monitoring
NAME                                  READY     STATUS    RESTARTS   AGE
alertmanager-56f6fdd9f6-z4vl8         1/1       Running   0          2h
grafana-core-867b94888d-td7b4         1/1       Running   0          5h
kube-state-metrics-694fdcf55f-797th   1/1       Running   0          5h
kube-state-metrics-694fdcf55f-tsvh5   1/1       Running   0          5h
node-directory-size-metrics-8rjvx     2/2       Running   0          5h
node-directory-size-metrics-z86cs     2/2       Running   0          5h
prometheus-core-5cf65c7b68-2dg5r      1/1       Running   0          2h
prometheus-node-exporter-8dccv        1/1       Running   0          5h
prometheus-node-exporter-rmlwk        1/1       Running   0          5h

alertmanager나 kube-state-metrics등도 시작해 줍니다.

NodePort도 구축해주기 때문에 URL에서 확인해 봅시다.

# kubectl get svc --namespace=monitoring
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
alertmanager               NodePort    100.64.36.59     <none>        9093:32651/TCP   5h
grafana                    NodePort    100.70.155.49    <none>        3000:31676/TCP   5h
kube-state-metrics         ClusterIP   100.66.157.125   <none>        8080/TCP         5h
prometheus                 NodePort    100.71.101.61    <none>        9090:32296/TCP   5h
prometheus-node-exporter   ClusterIP   None             <none>        9100/TCP         5h

prometheus, grafana 모두 함께 일어나고 있습니다.

다만, kubernetes-pod-resources등을 확인하면, N/A라고 나와 있어, 올바르게 확인을 할 수 없습니다.
이는 prometheus 설정이며 cAdvisor를 얻지 못했기 때문입니다.

prometheus 설정 변경

prometheus의 ConfigMap을 수정하고 cAdvisor를 얻으십시오.

kubectl edit configmap prometheus-core --namespace=monitoring

scrape_configs 다음에 job_name을 추가합시다.
(공식 copipe입니다 : htps : // 기주 b. 이 m/p 어려워 s/p 어려워 s/bぉb/마s r/도쿠멘들 온/에아 mpぇs/p 뻗어 ㅇ s 쿠베 r 네 s. yml)

- job_name: 'kubernetes-cadvisor'

  # Default to scraping over https. If required, just disable this or change to
  # `http`.
  scheme: https

  # This TLS & bearer token file config is used to connect to the actual scrape
  # endpoints for cluster components. This is separate to discovery auth
  # configuration because discovery & scraping are two separate concerns in
  # Prometheus. The discovery auth config is automatic if Prometheus runs inside
  # the cluster. Otherwise, more config options have to be provided within the
  # <kubernetes_sd_config>.
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: node

  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

설정을 반영한 후 configMap을 반영하기 위해 일단 포드를 삭제합니다.

kubectl delete pods prometheus-core-XXX --namespace=monitoring

다시 시작하고 Running이 되어 있으면 OK입니다.
기동에 실패했을 경우는, yaml의 편집 미스등이 생각됩니다(자신이 했기 때문에).

재부팅 후 grafana를 확인하면 노드 전체의 CPU나 Memory, Pod의 리소스 등을 확인할 수 있습니다.

이제 Node의 상태를 그래픽으로 모니터링 할 수있었습니다!

결론

Alertmanager를 이용하여 Node의 상태에 따라 Slack에 경고를 날리는 등도 가능합니다.

datadog라면 dd-agent의 daemonset를 기동하는 것만으로 메트릭스를 취할 수 있는 & 경고도 설정할 수 있으므로, 소규모 or 돈이 있다면 datadog로 좋은 생각이 듭니다.

다음은 datadog kubernetes의 클러스터 모니터링 화면. 이해하기 쉽다.

Reference

이 문제에 관하여(prometheus에 의한 kubernetes의 클러스터 모니터링), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/miyacomaru/items/6fa4121e7ba765e9efd6

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다