Elasticsearch + Kibana + Embulk on VirtualBox

VirtualBox에 2대의 가상 머신을 준비해, 1대째는 Docker를 이용한 Elasticsearch와 Kibana 환경, 2대째는 가상 머신에 Embulk를 인스톨 해 로그를 넣어 보았을 때의 기록입니다.


품목
대상


Hypervisor
VirtualBox

OS
CentOS Linux release 7.4.1708 (Core)

가상 머신 01 Embulk
192.168.56.28

가상 머신 02 Elasticsearch Kibana
192.168.56.29


※OS에 대해서는, 우리 쪽의 환경과 완전히 같지 않는 경우도 있으므로 부족한 경우는, yum등으로 인스톨 해 주세요.

1. 가상 머신 01 Elasticsearch Kibana



Elasticsearch Kibana



Elasticsearch와 Kibanaha는 컨테이너 사용을 위해 Docker와 Docker-Compose를 설치합니다.

명령
# yum install -y docker
# systemctl enable docker
# systemctl start docker
# curl -L https://github.com/docker/compose/releases/download/1.9.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
# chmod +x /usr/local/bin/docker-compose

Elasticsearch와 Kibana docker-compose.yml을 만듭니다.

명령
# vim docker-compose.yml
----------------------------------------------------
elasticsearch:
  image: elasticsearch
  container_name: elasticsearch
  ports:
    - "9200:9200"
  environment:
      ES_JAVA_OPTS: '-Xms2048m -Xmx2048m'
kibana:
  image: kibana
  container_name: kibana
  links:
    - elasticsearch:elasticsearch
  ports:
    - "5601:5601"
----------------------------------------------------
:wq

Elasticsearch와 Kibana 컨테이너를 시작합니다.

명령
# docker-compose up -d
Pulling elasticsearch (elasticsearch:latest)...
Trying to pull repository docker.io/library/elasticsearch ...
latest: Pulling from docker.io/library/elasticsearch
723254a2c089: Pull complete
abe15a44e12f: Pull complete
409a28e3cc3d: Pull complete
a9511c68044a: Pull complete
9d1b16e30bc8: Pull complete
0fc5a09c9242: Pull complete
d34976006493: Pull complete
3b70003f0c10: Pull complete
c85e66a46c7c: Pull complete
c1d6383769d6: Pull complete
da8d73630b44: Pull complete
5f0e52287884: Pull complete
770995441948: Pull complete
a5b2e358a5e0: Pull complete
7ab1d4a5e3eb: Pull complete
Digest: sha256:04f7cfc825b2951f928be7eb74defa5ac8687c990ba70319dae1d6119488ae9e
Pulling kibana (kibana:latest)...
Trying to pull repository docker.io/library/kibana ...
latest: Pulling from docker.io/library/kibana
f49cf87b52c1: Pull complete
9e8acb2289dd: Pull complete
d495c79e5bf4: Pull complete
81c8b3679622: Pull complete
2a4eff393768: Pull complete
5fa4e981b17d: Pull complete
e23852241c5b: Pull complete
411a85463ec1: Pull complete
8206f115bd3e: Pull complete
Digest: sha256:fe3ffbd866108f9c98a76fdf51db2c6c9cc937fb8ba153d4474acff72265d86a
Creating elasticsearch
Creating kibana
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                              NAMES
3fd18e23c9da        kibana              "/docker-entrypoint.s"   About a minute ago   Up About a minute   0.0.0.0:5601->5601/tcp             kibana
88665a0b7aa5        elasticsearch       "/docker-entrypoint.s"   About a minute ago   Up About a minute   0.0.0.0:9200->9200/tcp, 9300/tcp   elasticsearch

2. 가상 머신 02 Embuluk



Embulk



Embulk를 설치합니다.

명령
# yum install -y java
# curl --create-dirs -o ~/.embulk/bin/embulk -L "http://dl.embulk.org/embulk-latest.jar"
# chmod +x ~/.embulk/bin/embulk
# echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
# source ~/.bashrc
# embulk
Embulk v0.8.39
Usage: embulk [-vm-options] <command> [--options]
Commands:
   mkbundle   <directory>                             # create a new plugin bundle environment.
   bundle     [directory]                             # update a plugin bundle environment.
   run        <config.yml>                            # run a bulk load transaction.
   cleanup    <config.yml>                            # cleanup resume state.
   preview    <config.yml>                            # dry-run the bulk load without output and show preview.
   guess      <partial-config.yml> -o <output.yml>    # guess missing parameters to create a complete configuration file.
   gem        <install | list | help>                 # install a plugin or show installed plugins.
   new        <category> <name>                       # generates new plugin template
   migrate    <path>                                  # modify plugin code to use the latest Embulk plugin API
   example    [path]                                  # creates an example config file and csv file to try embulk.
   selfupdate [version]                               # upgrades embulk to the latest released version or to the specified version.

VM options:
   -E...                            Run an external script to configure environment variables in JVM
                                    (Operations not just setting envs are not recommended nor guaranteed.
                                     Expect side effects by running your external script at your own risk.)
   -J-O                             Disable JVM optimizations to speed up startup time (enabled by default if command is 'run')
   -J+O                             Enable JVM optimizations to speed up throughput
   -J...                            Set JVM options (use -J-help to see available options)
   -R...                            Set JRuby options (use -R--help to see available options)

Use `<command> --help` to see description of the commands.

Elasticsearch 플러그인을 설치합니다.

명령
# embulk gem install embulk-output-elasticsearch_ruby
2017-12-19 20:03:16.257 +0900: Embulk v0.8.39

********************************** INFORMATION **********************************
  Join us! Embulk-announce mailing list is up for IMPORTANT annoucement such as
    compatibility-breaking changes and key feature updates.
  https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************


Gem plugin path is: /root/.embulk/jruby/2.3.0

Fetching: multi_json-1.12.2.gem (100%)
Successfully installed multi_json-1.12.2
Fetching: multipart-post-2.0.0.gem (100%)
Successfully installed multipart-post-2.0.0
Fetching: faraday-0.13.1.gem (100%)
Successfully installed faraday-0.13.1
Fetching: elasticsearch-transport-6.0.0.gem (100%)
Successfully installed elasticsearch-transport-6.0.0
Fetching: elasticsearch-api-6.0.0.gem (100%)
Successfully installed elasticsearch-api-6.0.0
Fetching: elasticsearch-6.0.0.gem (100%)
Successfully installed elasticsearch-6.0.0
Fetching: excon-0.60.0.gem (100%)
Successfully installed excon-0.60.0
Fetching: embulk-output-elasticsearch_ruby-0.1.6.gem (100%)
Successfully installed embulk-output-elasticsearch_ruby-0.1.6
8 gems installed

동적으로 columns를 만드는 플러그인도 추가합니다.

명령
# embulk gem install embulk-parser-csv_guessable

2017-12-20 22:52:10.568 +0900: Embulk v0.8.34

Gem plugin path is: /root/.embulk/jruby/2.3.0

Fetching: embulk-parser-csv_guessable-0.1.5.gem (100%)
Successfully installed embulk-parser-csv_guessable-0.1.5
1 gem installed

map.json 파일을 만듭니다. 로드할 로그 파일의 컬럼 요소 등이 불명한 경우는, type을 string, index를 not_analyzed로 컬럼 수분 지정해, 우선 읽을 수 있도록(듯이) 합니다. 이번은 31 컬럼 있는 로그입니다. (이것은 역동적인 느낌입니다.)

명령
# vim import_log.json
-------------------------------------------------------
{
    "mappings": {
        "ログファイル名": {
            "properties": {
                "1": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "2": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "3": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "4": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "5": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "6": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "7": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "8": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "9": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "10": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "11": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "12": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "13": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "14": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "15": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "16": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "17": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "18": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "19": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "20": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "21": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "22": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "23": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "24": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "25": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "26": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "27": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "28": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "29": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "30": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "31": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}
-------------------------------------------------------
:wq

읽을 로그를 저장하는 디렉토리를 만들고 WinSCP 등으로 로그를 저장합니다.
import.yml을 만듭니다.

명령
# mkdir /var/log/import
# vim import.yml
-------------------------------------------------------
in:
  type: file
  path_prefix: /var/log/import/import.log
  parser:
    type: csv_guessable
    schema_file: /var/log/import/import.log
    columns:
      - {name: 1, type: string}
      - {name: 2, type: string}
      - {name: 3, type: string}
      - {name: 4, type: string}
      - {name: 5, type: string}
      - {name: 6, type: string}
      - {name: 7, type: string}
      - {name: 8, type: string}
      - {name: 9, type: string}
      - {name: 10, type: string}
      - {name: 11, type: string}
      - {name: 12, type: string}
      - {name: 13, type: string}
      - {name: 14, type: string}
      - {name: 15, type: string}
      - {name: 16, type: string}
      - {name: 17, type: string}
      - {name: 18, type: string}
      - {name: 19, type: string}
      - {name: 20, type: string}
      - {name: 21, type: string}
      - {name: 22, type: string}
      - {name: 23, type: string}
      - {name: 24, type: string}
      - {name: 25, type: string}
      - {name: 26, type: string}
      - {name: 27, type: string}
      - {name: 28, type: string}
      - {name: 29, type: string}
      - {name: 30, type: string}
      - {name: 31, type: string}
      - {name: 32, type: string}
exec: {}
out:
    type: elasticsearch_ruby
    nodes:
    - {host: 192.168.56.29, port: 9200}
    index: import
    index_type: import
-------------------------------------------------------
:wq

매핑을 curl로 설정합니다. access_log는 매핑을 등록하는 인덱스 이름입니다.

명령
# curl -XPUT '192.168.56.29:9200/embulk_access_log' -d @import_log.json
{"acknowledged":true,"shards_acknowledged":true,"index":"embulk_access_log"}

Embulk를 실행합니다.

명령
# embulk run import.yml
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
2017-12-21 17:23:18.984 +0900: Embulk v0.8.39

********************************** INFORMATION **********************************
  Join us! Embulk-announce mailing list is up for IMPORTANT annoucement such as
    compatibility-breaking changes and key feature updates.
  https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************

2017-12-21 17:23:30.984 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-elasticsearch_ruby (0.1.6)
2017-12-21 17:23:31.130 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-csv_guessable (0.1.5)
2017-12-21 17:23:31.202 +0900 [INFO] (0001:transaction): Listing local files at directory '/var/log/import' filtering filename by prefix 'import.log'
2017-12-21 17:23:31.203 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-12-21 17:23:31.206 +0900 [INFO] (0001:transaction): Loading files [/var/log/import/import.log]
2017-12-21 17:23:31.375 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2017-12-21 17:23:31.405 +0900 [INFO] (0001:transaction): mode => normal
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): nodes => [{"host"=>"192.168.56.29", "port"=>9200}]
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index => import
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index_type => import
2017-12-21 17:23:31.437 +0900 [INFO] (0001:transaction): alias =>
2017-12-21 17:23:31.619 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2017-12-21 17:23:34.129 +0900 [INFO] (0014:task-0000): bulk: 287 success.
2017-12-21 17:23:34.130 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2017-12-21 17:23:34.139 +0900 [INFO] (main): Committed.
2017-12-21 17:23:34.139 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/var/log/import/import.log"},"out":{}}

브라우저를 시작하고 Kibana에서 확인합니다.



Index pattern에 embulk_access_log를 지정하고 Create 버튼을 클릭합니다.





Kibana에 대해서는, 캡처한 로그를 가시화해 분석 등 할 수 있습니다만, 이것은 이것으로 공부가 필요하네요.

3. 참고 도서


  • 데이터 분석 기반 구축 입문 Fluentd, Elasticsearch, Kibana에 의한 로그 수집 및 시각화
  • 좋은 웹페이지 즐겨찾기