Elasticsearch 처리 데이터 관련 관계

관계형 데이터베이스의 범례화 디자인: 범례화 디자인(Normalization)의 주요 목적은 불필요한 업데이트를 줄이는 것이지만 완전한 범례화 디자인의 데이터는 조회가 느린 문제에 직면하게 된다(데이터베이스가 범례화될수록 Join의 표가 많아진다)
반범주화 설계(Denormalization): 데이터가 편평하고 관련 관계를 사용하지 않고 문서에 불필요한 데이터 복사본을 저장합니다

장점: Join 작업을 처리할 필요가 없고 데이터 읽기 성능이 좋다(Elasticsearch는 _source 필드를 압축하여 디스크 비용을 줄인다)

단점: 데이터가 빈번하게 수정되는 장면에 적합하지 않다

관계형 데이터베이스는 일반적으로 Normalize 데이터를 고려하는데 Elasticsearch에서는 Denormalize 데이터(Denormalize의 장점: 읽는 속도가 빠르고 테이블 연결이 필요 없고 자물쇠가 필요 없음)를 고려한다.
Elasticsearch는 연관 관계를 잘 처리하지 못하며, 일반적으로 다음과 같은 네 가지 방식으로 처리합니다.

객체 유형

네스트된 객체(Nested Object)

모/자 관계(Parent/Child)

응용단 관련

대비

Nested Object
Parent/Child
이점
문서가 함께 저장되어 읽기 성능이 높음
부자 문서는 독립적으로 업데이트할 수 있다
결점
중첩된 하위 문서를 업데이트할 때 전체 문서를 업데이트해야 합니다.
읽기 성능이 상대적으로 떨어지는 추가 메모리 유지 관리 관계가 필요합니다.

객체 유형

사례 1: 기사 및 저자 정보(1:1 관계)

DELETE articles
# articles mappings 
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
# 
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":{  
    "userid":1001,  
    "username":"liu"
  }  
} 
# 
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"match": {  
          "content": "Elasticsearch"  
        }},  
        {"match": {  
          "author.username": "liu"  
        }}  
      ]  
    }  
  }  
}

사례2: 문장과 저자의 정보(1:n 관계)(문제가 있습니다!)

DELETE articles
# articles mappings 
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
POST articles/_search
# 
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":[{  
    "userid":1001,  
    "username":"liu"
  },{
    "userid":1002,  
    "username":"jia"
  }]
} 
# ( ！ ？)
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"match": {  
          "author.userid": "1001"  
        }},  
        {"match": {  
          "author.username": "jia"  
        }}  
      ]  
    }  
  }  
}

대상을 사용하여 수조가 있는 문서를 저장할 때, 우리는 필요하지 않은 결과를 조회할 수 있는데, 원인은 무엇입니까?
저장할 때 내부 대상의 경계가 고려되지 않고 JSON 형식은 편평한 키 값이 맞는 구조로 처리되어 여러 필드를 조회할 때 의외의 검색 결과를 초래하였다

"content":"Elasticsearch Helloworld！"
"time":"2020-01-01T00:00:00"
"author.userid":["1001","1002"]
"author.username":["liu","jia"]

네스트된 객체(Nested Object)를 사용하면 이 문제를 해결할 수 있습니다.

중첩된 객체

대상 그룹의 대상을 독립적으로 인덱스할 수 있도록 합니다. Nested와properties 키워드를 사용하여 모든 author를 여러 개의 분리된 문서로 인덱스합니다. 내부에서 Nested 문서는 두 개의 Lucene 문서에 저장되고 조회할 때 Join 처리를 합니다.
사례1: 기사와 저자의 정보(1:n 관계)

DELETE articles
# articles mappings 
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "type": "nested", 
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
POST articles/_search
# 
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":[{  
    "userid":1001,  
    "username":"liu"
  },{
    "userid":1002,  
    "username":"jia"
  }]
} 
# ( ！ ？)
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"nested": {
          "path": "author",
          "query": {  
            "bool": {  
              "must": [  
                {"match": {  
                  "author.userid": "1001"  
                }},  
                {"match": {  
                  "author.username": "jia"  
                }}  
              ]  
            }  
          }
        }}
      ]  
    }  
  }  
}

부자 연관 관계

대상과 Nested 대상은 모두 일정한 한계가 존재한다. 매번 업데이트할 때마다 전체 대상을 다시 색인해야 한다. Elasticsearch는 유사한 관계형 데이터베이스에서 Join의 실현을 제공한다. Parent/Child의 관계를 유지함으로써 두 대상을 분리할 수 있다. 부모 문서와 하위 문서는 두 개의 독립된 문서이고 부모 문서를 업데이트할 때 하위 문서를 다시 색인할 필요가 없다. 하위 문서가 추가된다.상위 문서와 다른 하위 문서에는 업데이트 또는 삭제가 적용되지 않습니다.
사례: 기사 및 저자 정보(1:n 관계)

DELETE articles
# articles mappings 
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "article_author_relation": {  
        "type": "join",  
        "relations": {  
          "article": "author"  
        }
      },
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      }
    }  
  }  
} 
# 
PUT articles/_doc/article1
{  
  "article_author_relation":{
    "name":"article"
  },
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00"
} 
# 
PUT articles/_doc/author1?routing=article1
{  
  "article_author_relation":{
    "name":"author",
    "parent":"article1"
  },
  "userid":"1001",  
  "username":"jia"
} 
PUT articles/_doc/author2?routing=article1
{  
  "article_author_relation":{
    "name":"author",
    "parent":"article1"
  },
  "userid":"1002",  
  "username":"liu"
} 
GET articles/_doc/article1
POST articles/_search
# parent_id id 
POST articles/_search  
{  
  "query": {
    "parent_id":{
      "type":"author",
      "id":"article1"
    }
  }  
}
#has_child 
POST articles/_search  
{  
  "query": {
    "has_child":{
      "type":"author",
      "query": {
        "match": {
          "username": "liu"
        }
      }
    }
  }  
}
#has_parent 
POST articles/_search  
{  
  "query": {
    "has_parent":{
      "parent_type":"article",
      "query": {
        "match": {
          "content": "elasticsearch"
        }
      }
    }
  }  
}

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Embulk를 사용하여 ElasticCloud로 보내기

Embulk에서 ElasticCloud에 보낼 수 있을까라고 생각비망록도 겸해 기술을 남깁니다 Embulk 설치 ElasticCloud (14 일 체험판) brew라면 아래 명령 입력 파일 만들기 파일 내용 seed...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

Elasticsearch 처리 데이터 관련 관계

대비

객체 유형

중첩된 객체

부자 연관 관계

좋은 웹페이지 즐겨찾기