Elasticsearch + Kibana + Embulk on VirtualBox

VirtualBox에 2대의 가상 머신을 준비해, 1대째는 Docker를 이용한 Elasticsearch와 Kibana 환경, 2대째는 가상 머신에 Embulk를 인스톨 해 로그를 넣어 보았을 때의 기록입니다.



CentOS Linux release 7.4.1708 (Core)

가상 머신 01 Embulk

가상 머신 02 Elasticsearch Kibana

※OS에 대해서는, 우리 쪽의 환경과 완전히 같지 않는 경우도 있으므로 부족한 경우는, yum등으로 인스톨 해 주세요.

1. 가상 머신 01 Elasticsearch Kibana

Elasticsearch Kibana

Elasticsearch와 Kibanaha는 컨테이너 사용을 위해 Docker와 Docker-Compose를 설치합니다.

# yum install -y docker
# systemctl enable docker
# systemctl start docker
# curl -L https://github.com/docker/compose/releases/download/1.9.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
# chmod +x /usr/local/bin/docker-compose

Elasticsearch와 Kibana docker-compose.yml을 만듭니다.

# vim docker-compose.yml
  image: elasticsearch
  container_name: elasticsearch
    - "9200:9200"
      ES_JAVA_OPTS: '-Xms2048m -Xmx2048m'
  image: kibana
  container_name: kibana
    - elasticsearch:elasticsearch
    - "5601:5601"

Elasticsearch와 Kibana 컨테이너를 시작합니다.

2. 가상 머신 02 Embuluk


Embulk를 설치합니다.

# yum install -y java
# curl --create-dirs -o ~/.embulk/bin/embulk -L "http://dl.embulk.org/embulk-latest.jar"
# chmod +x ~/.embulk/bin/embulk
# echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
# source ~/.bashrc
# embulk
Embulk v0.8.39
Usage: embulk [-vm-options] <command> [--options]
   mkbundle   <directory>                             # create a new plugin bundle environment.
   bundle     [directory]                             # update a plugin bundle environment.
   run        <config.yml>                            # run a bulk load transaction.
   cleanup    <config.yml>                            # cleanup resume state.
   preview    <config.yml>                            # dry-run the bulk load without output and show preview.
   guess      <partial-config.yml> -o <output.yml>    # guess missing parameters to create a complete configuration file.
   gem        <install | list | help>                 # install a plugin or show installed plugins.
   new        <category> <name>                       # generates new plugin template
   migrate    <path>                                  # modify plugin code to use the latest Embulk plugin API
   example    [path]                                  # creates an example config file and csv file to try embulk.
   selfupdate [version]                               # upgrades embulk to the latest released version or to the specified version.

VM options:
   -E...                            Run an external script to configure environment variables in JVM
                                    (Operations not just setting envs are not recommended nor guaranteed.
                                     Expect side effects by running your external script at your own risk.)
   -J-O                             Disable JVM optimizations to speed up startup time (enabled by default if command is 'run')
   -J+O                             Enable JVM optimizations to speed up throughput
   -J...                            Set JVM options (use -J-help to see available options)
   -R...                            Set JRuby options (use -R--help to see available options)

Use `<command> --help` to see description of the commands.

Elasticsearch 플러그인을 설치합니다.

# embulk gem install embulk-output-elasticsearch_ruby
map.json 파일을 만듭니다. 로드할 로그 파일의 컬럼 요소 등이 불명한 경우는, type을 string, index를 not_analyzed로 컬럼 수분 지정해, 우선 읽을 수 있도록(듯이) 합니다. 이번은 31 컬럼 있는 로그입니다. (이것은 역동적인 느낌입니다.)

# vim import_log.json
    "mappings": {
        "ログファイル名": {
            "properties": {
                "1": {
                    "type": "string",
                    "index": "not_analyzed"
                "2": {
                    "type": "string",
                    "index": "not_analyzed"
                "3": {
                    "type": "string",
                    "index": "not_analyzed"
                "4": {
                    "type": "string",
                    "index": "not_analyzed"
                "5": {
                    "type": "string",
                    "index": "not_analyzed"
                "6": {
                    "type": "string",
                    "index": "not_analyzed"
                "7": {
                    "type": "string",
                    "index": "not_analyzed"
                "8": {
                    "type": "string",
                    "index": "not_analyzed"
                "9": {
                    "type": "string",
                    "index": "not_analyzed"
                "10": {
                    "type": "string",
                    "index": "not_analyzed"
                "11": {
                    "type": "string",
                    "index": "not_analyzed"
                "12": {
                    "type": "string",
                    "index": "not_analyzed"
                "13": {
                    "type": "string",
                    "index": "not_analyzed"
                "14": {
                    "type": "string",
                    "index": "not_analyzed"
                "15": {
                    "type": "string",
                    "index": "not_analyzed"
                "16": {
                    "type": "string",
                    "index": "not_analyzed"
                "17": {
                    "type": "string",
                    "index": "not_analyzed"
                "18": {
                    "type": "string",
                    "index": "not_analyzed"
                "19": {
                    "type": "string",
                    "index": "not_analyzed"
                "20": {
                    "type": "string",
                    "index": "not_analyzed"
                "21": {
                    "type": "string",
                    "index": "not_analyzed"
                "22": {
                    "type": "string",
                    "index": "not_analyzed"
                "23": {
                    "type": "string",
                    "index": "not_analyzed"
                "24": {
                    "type": "string",
                    "index": "not_analyzed"
                "25": {
                    "type": "string",
                    "index": "not_analyzed"
                "26": {
                    "type": "string",
                    "index": "not_analyzed"
                "27": {
                    "type": "string",
                    "index": "not_analyzed"
                "28": {
                    "type": "string",
                    "index": "not_analyzed"
                "29": {
                    "type": "string",
                    "index": "not_analyzed"
                "30": {
                    "type": "string",
                    "index": "not_analyzed"
                "31": {
                    "type": "string",
                    "index": "not_analyzed"

읽을 로그를 저장하는 디렉토리를 만들고 WinSCP 등으로 로그를 저장합니다.
import.yml을 만듭니다.

# mkdir /var/log/import
# vim import.yml
  type: file
  path_prefix: /var/log/import/import.log
    type: csv_guessable
    schema_file: /var/log/import/import.log
      - {name: 1, type: string}
      - {name: 2, type: string}
      - {name: 3, type: string}
      - {name: 4, type: string}
      - {name: 5, type: string}
      - {name: 6, type: string}
      - {name: 7, type: string}
      - {name: 8, type: string}
      - {name: 9, type: string}
      - {name: 10, type: string}
      - {name: 11, type: string}
      - {name: 12, type: string}
      - {name: 13, type: string}
      - {name: 14, type: string}
      - {name: 15, type: string}
      - {name: 16, type: string}
      - {name: 17, type: string}
      - {name: 18, type: string}
      - {name: 19, type: string}
      - {name: 20, type: string}
      - {name: 21, type: string}
      - {name: 22, type: string}
      - {name: 23, type: string}
      - {name: 24, type: string}
      - {name: 25, type: string}
      - {name: 26, type: string}
      - {name: 27, type: string}
      - {name: 28, type: string}
      - {name: 29, type: string}
      - {name: 30, type: string}
      - {name: 31, type: string}
      - {name: 32, type: string}
exec: {}
    type: elasticsearch_ruby
    - {host:, port: 9200}
    index: import
    index_type: import

매핑을 curl로 설정합니다. access_log는 매핑을 등록하는 인덱스 이름입니다.

# curl -XPUT '' -d @import_log.json

Embulk를 실행합니다.

# embulk run import.yml
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
2017-12-21 17:23:18.984 +0900: Embulk v0.8.39

********************************** INFORMATION **********************************
  Join us! Embulk-announce mailing list is up for IMPORTANT annoucement such as
    compatibility-breaking changes and key feature updates.

2017-12-21 17:23:30.984 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-elasticsearch_ruby (0.1.6)
2017-12-21 17:23:31.130 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-csv_guessable (0.1.5)
2017-12-21 17:23:31.202 +0900 [INFO] (0001:transaction): Listing local files at directory '/var/log/import' filtering filename by prefix 'import.log'
2017-12-21 17:23:31.203 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-12-21 17:23:31.206 +0900 [INFO] (0001:transaction): Loading files [/var/log/import/import.log]
2017-12-21 17:23:31.375 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2017-12-21 17:23:31.405 +0900 [INFO] (0001:transaction): mode => normal
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): nodes => [{"host"=>"", "port"=>9200}]
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index => import
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index_type => import
2017-12-21 17:23:31.437 +0900 [INFO] (0001:transaction): alias =>
2017-12-21 17:23:31.619 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2017-12-21 17:23:34.129 +0900 [INFO] (0014:task-0000): bulk: 287 success.
2017-12-21 17:23:34.130 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2017-12-21 17:23:34.139 +0900 [INFO] (main): Committed.
2017-12-21 17:23:34.139 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/var/log/import/import.log"},"out":{}}

브라우저를 시작하고 Kibana에서 확인합니다.

Index pattern에 embulk_access_log를 지정하고 Create 버튼을 클릭합니다.

Kibana에 대해서는, 캡처한 로그를 가시화해 분석 등 할 수 있습니다만, 이것은 이것으로 공부가 필요하네요.

3. 참고 도서

  • 데이터 분석 기반 구축 입문 Fluentd, Elasticsearch, Kibana에 의한 로그 수집 및 시각화
