Elasticsearch 일반 작업: 맵 편

[TOC]
사실은es의 필드 형식은 es가 자동 검사를 하는지 아니면 우리가 지정해야 하는지입니다. 따라서 동적 맵과 정적 맵으로 나뉩니다.

1 동적 매핑

1.1 매핑 규칙

JSON 형식의 데이터
자동 추정 필드 유형
null
필드가 추가되지 않았습니다.
true or false
boolean 유형
부동 소수점 유형 숫자
float 유형
숫자
long 유형
JSON 객체
object 유형
수조
그룹 내 첫 번째 비공백 값으로 결정
string
날짜 형식 (오픈 날짜 검사), 더블 또는 롱 형식,text 형식,keyword 형식일 수 있습니다

1.2 날짜 검사

기본적으로 켜짐(es5.4), 테스트 사례는 다음과 같습니다.

PUT myblog

GET myblog/_mapping

PUT myblog/article/1
{
  "id":1,
  "postdate":"2018-10-27"
}

GET myblog/_mapping
{
  "myblog": {
    "mappings": {
      "article": {
        "properties": {
          "id": {
            "type": "long"
          },
          "postdate": {
            "type": "date"
          }
        }
      }
    }
  }
}

날짜 체크를 닫으면 다음과 같이 날짜로 체크되지 않습니다.

PUT myblog
{
  "mappings": {
    "article": {
      "date_detection": false
    }
  }
}

GET myblog/_mapping

PUT myblog/article/1
{
  "id":1,
  "postdate":"2018-10-27"
}

GET myblog/_mapping
{
  "myblog": {
    "mappings": {
      "article": {
        "date_detection": false,
        "properties": {
          "id": {
            "type": "long"
          },
          "postdate": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

2 정적 매핑

2.1 기본 사례

PUT myblog
{
  "mappings": {
    "article": {
      "properties": {
        "id":{"type": "long"},
        "title":{"type": "text"},
        "postdate":{"type": "date"}
      }
    }
  }
}

GET myblog/_mapping

PUT myblog/article/1
{
  "id":1,
  "title":"elasticsearch is wonderful!",
  "postdate":"2018-10-27"
}

GET myblog/_mapping
{
  "myblog": {
    "mappings": {
      "article": {
        "properties": {
          "id": {
            "type": "long"
          },
          "postdate": {
            "type": "date"
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
  }
}

2.2 dynamic 속성

기본적으로 문서를 추가할 때 새로운 필드가 나타나면 es도 추가됩니다. 그러나 이것은 제어할 수 있습니다. dynamic을 통해 설정합니다.
동적 값
설명
true
기본값은true, 필드 자동 추가
false
새 필드 무시
strict
엄격한 모드, 새로운 필드 던지기 이상 발견

PUT myblog
{
  "mappings": {
    "article": {
      "dynamic":"strict",
      "properties": {
        "id":{"type": "long"},
        "title":{"type": "text"},
        "postdate":{"type": "date"}
      }
    }
  }
}

GET myblog/_mapping

PUT myblog/article/1
{
  "id":1,
  "title":"elasticsearch is wonderful!",
  "content":"a long text",
  "postdate":"2018-10-27"
}

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [article] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [article] is not allowed"
  },
  "status": 400
}

3 필드 유형

3.1 일반 필드 유형

일급 분류
이급 분류
구체적인 유형
핵심 유형
문자열 유형
string、text、keyword
숫자 유형
long、intger、short、byte、double、float、half_float、scaled_float
날짜 유형
date
부울 유형
boolean
바이너리 유형
binary
범위 유형
range
복합 유형
배열 유형
array
객체 유형
object
중첩 유형
nested
지리적 유형
지리 좌표
geo_point
지리 도형
geo_shape
특수 유형
IP 유형
ip
범위 유형
completion
영패 계수 유형
token_count
첨부 파일 유형
attachment
추출 유형
percolator
다음은 개인 업무에서 자주 사용하는 것들만 열거할 뿐, 상세한 것은 공식 문서를 참고할 수 있다.https://www.elastic.co/guide/en/elasticsearch/reference/5.6/mapping.html.

3.1.1 string

ex 5.x 이후에는 지원하지 않지만,text나 키워드로 대체할 수 있습니다.

3.1.2 text

전체 텍스트 검색에 사용되는 필드는 필드의 내용을 분사기에 의해 분석하고, 역렬 인덱스를 생성하기 전에 문자열은 분사기에 의해 하나하나의 단어 항목으로 나뉜다.
실제 응용에서text는 긴 텍스트의 필드, 예를 들어article의content에 많이 사용된다. 분명히 이런 필드는 정렬과 집합에 사용되는 것은 큰 의미가 없다.

3.1.3 keyword

텍스트 형식과 달리 정확한 값으로만 검색할 수 있습니다.
그 색인의 단어 항목은 모두 필드 내용 자체이기 때문에 실제 응용에서 비교, 정렬, 집합 등 조작에 사용된다.

3.1.4 숫자 유형

구체적으로 주의하는 세부 문제는 공식 문서를 고려할 수 있고 일반적인 사용은 수요를 만족시킬 수 있다.

3.1.5 date

json에 날짜 형식이 없기 때문에 기본 es의 시간 형식은 다음과 같습니다.

1."yyyy-MM-dd"또는 "yyy-MM-ddTHH:mm:ssZ"

즉'yyy-MM-dd HH:mm:ss'는'2018-10-22T23:12:22Z'라는 형식으로 써야 하는데 사실은 시간대를 추가한 것이다.

2.밀리초의 timestamp를 나타내는 긴 정형수

3.초의 timestamp를 나타내는 정형수

es 내부에 저장된 것은 밀리초 시간의 긴 정형수입니다.
물론 위쪽은 기본적으로 필드의 형식을 설정할 때 저희가 정의한 시간 형식을 설정할 수 있습니다.

PUT myblog
{
  "mappings": {
    "article": {
      "properties": {
        "postdate":{
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    }
  }
}

format에서 여러 날짜 형식을 지정할 수도 있습니다. "|"을 사용하여 구분합니다.

"format": "yyyy-MM-dd HH:mm:ss||yyyy/MM/dd HH:mm:ss"

다음에 정의된 시간 형식의 데이터를 쓸 수 있습니다.

PUT myblog/article/1
{
  "postdate":"2017-09-23 23:12:22"
}

내 작업 장면에서 만약 시간을 저장해야 한다면, 밀리초 값의timestamp로 처리한 다음에es에 저장하고, 디스플레이를 꺼낼 때 시간 문자열로 처리할 때가 많다.

3.1.6 boolean

필드 형식을 boolean으로 설정하면 입력할 수 있는 값은true,false,true,false입니다.

3.1.7 binary

binary 형식은base64 인코딩된 문자열을 받아들입니다.

3.1.8 array

es는 전용 그룹 형식이 없습니다. 기본적으로 모든 필드에 하나 이상의 값을 포함할 수 있지만, 한 그룹의 값은 같은 형식이어야 합니다.동적으로 데이터를 추가할 때, 그룹의 첫 번째 값의 유형은 전체 그룹의 유형을 결정하고, 혼합 그룹은 지원하지 않습니다.수조는null값을 포함할 수 있으며, 빈 수조 []는missingfield로 취급됩니다.또한 문서에서array 형식을 사용하면 미리 설정할 필요가 없습니다. 기본 지원입니다.
예를 들어 다음 배열의 필드 데이터를 추가합니다.

DELETE my_index

PUT my_index/my_type/1
{
  "lists":[
    {
      "name":"xpleaf",
      "job":"es"
    }
  ]
}

이 필드의 실제 유형은 동적으로 text로 매핑됩니다.

GET my_index/my_type/_mapping

{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "lists": {
            "properties": {
              "job": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

직접 검색도 지원됩니다.

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "lists.name": {
        "value": "xpleaf"
      }
    }
  }
}

반환 결과:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "lists": [
            {
              "name": "xpleaf",
              "job": "es"
            }
          ]
        }
      }
    ]
  }
}

3.1.9 object

다음과 같이 json 객체를 es에 직접 쓸 수 있습니다.

DELETE my_index

PUT my_index/my_type/1
{
  "object":{
    "name":"xpleaf",
    "job":"es"
  }
}

이 필드의 실제 유형은 동적으로 text로 매핑됩니다.

{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "object": {
            "properties": {
              "job": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

직접 검색해도 됩니다.

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "object.name": {
        "value": "xpleaf"
      }
    }
  }
}

반환 결과:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "object": {
            "name": "xpleaf",
            "job": "es"
          }
        }
      }
    ]
  }
}

object 대상은 실제적으로 es 내부에서 편평하게 처리된다. 예를 들어 위의, es에서 실제적으로 다음과 같다.
{"object.name":"xpleaf", "object.job":"es"}

3.1.10 nested

nested 형식은object 형식의 특례로 대상 그룹을 독립적으로 인덱스하고 조회할 수 있습니다.Lucene는 내부 대상의 개념이 없기 때문에es는 대상의 차원을 편평하게 하여 필드의 이름과 값으로 구성된 간단한 목록으로 전환시킨다.
Object 형식의 특례이지만 그 필드의 type은 고정된, 즉nested입니다. 이것은 Object와 가장 큰 차이입니다.
그럼 왜 네스티드 타입을 사용하나요, Object를 사용하면 되지 않나요?여기에 공식적으로 제공한 예를 붙여서 설명하겠습니다.https://www.elastic.co/guide/en/elasticsearch/reference/5.6/nested.html):
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values. For instance, the following document:

PUT my_index/my_type/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

would be transformed internally into a document that looks more like this:

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

The user.first and user.last fields are flattened into multi-value fields, and the association between alice and white is lost. This document would incorrectly match a query for alice AND smith :

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}

위는object를 직접 사용해서 발생한 문제입니다. 즉, 실제로 위의 검색을 할 때 이 문서는 일치하지 않지만 정확하게 일치합니다.nested 대상 형식을 사용하면 그룹의 모든 대상의 독립성을 유지할 수 있습니다. nested 형식은 그룹의 모든 대상을 독립적으로 숨겨진 문서로 색인합니다. 이것은 모든 끼워 넣은 대상이 독립적으로 검색될 수 있음을 의미합니다.
If you need to index arrays of objects and to maintain the independence of each object in the array, you should use the nested datatype instead of the object datatype. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others, with the nested query:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "user": {
          "type": "nested" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}

인덱스 100개의nested 필드를 포함하는 문서는 사실상 인덱스 101개의 문서이며, 모든 인덱스 문서는 하나의 독립된 문서로 인덱스됩니다.중첩 필드의 수를 과도하게 정의하는 것을 방지하기 위해 색인마다 정의할 수 있는 중첩 필드는 50개로 제한됩니다.

3.1.11 range

범위 유형 및 범위는 다음과 같습니다.
유형
범위
integer_range
-2^31~2^31-1
float_range
32-bit IEEE 754
long_range
-2^63~2^63-1
double_range
64-bit IEEE 754
date_range
64비트 정수(ms)

3.2원 필드

메타 필드는 문서 자체를 설명하는 필드로 분류 및 설명은 다음과 같습니다.
메타필드 분류
구체적인 속성
작용
문서 속성의 메타필드
_index
문서 색인
_uid _type 및 _id가 포함된 복합 필드 {type}#{id}_type
문서 유형
_id
문서의 id
소스 문서의 메타필드
_source
문서의 원래 JSON 문자열
_size
_소스 필드 크기
_all
모든 필드를 인덱스하는 하이퍼필드
_field_names
문서에 비어 있지 않은 값이 포함된 모든 필드
라우팅된 메타필드
_parent
문서 간의 부자 관계 지정하기
_routing
문서 라우팅을 특정 슬라이스에 대한 사용자 정의 라우팅 값
사용자 정의 메타필드
_meta
사용자 정의 메타데이터
각 필드에 대한 자세한 내용은 다음을 참조하십시오.https://www.elastic.co/guide/en/elasticsearch/reference/5.6/mapping-fields.html.

4 매핑 매개변수

참조:https://www.elastic.co/guide/en/elasticsearch/reference/5.6/mapping-params.html.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Embulk를 사용하여 ElasticCloud로 보내기

Embulk에서 ElasticCloud에 보낼 수 있을까라고 생각비망록도 겸해 기술을 남깁니다 Embulk 설치 ElasticCloud (14 일 체험판) brew라면 아래 명령 입력 파일 만들기 파일 내용 seed...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.