jOOQ를 사용하여 YugabyteDB에서 Nested Loop 조인 최적화

성능 문제#4903에 대한 해결 방법에 설명했습니다. 외부 테이블의 행이 많을 때 Nested Loop 조인 중에 지연 시간이 길었습니다. 해결 방법은 다음과 같습니다.

첫 번째 쿼리를 실행하여 조인 열 값 목록을 가져옵니다

내부 테이블의 WHERE IN() 절에서 이 목록을 푸시다운합니다.

SQL에서는 번거로울 수 있습니다. 나는 psql로 설정된 gset 변수로 작업을 수행했으며 느리게 언급했습니다. 동적 SQL을 사용하는 모든 언어에서 동일한 작업을 수행할 수 있습니다.

jOOQ 을 사용하면 SQL에 대한 모든 것이 더 쉽게 코딩되고 동적 SQL과 정적 SQL 사이의 장벽이 작아집니다.

다음은 이전 게시물을 기반으로 한 예입니다. 간단한 조인을 수행하도록 쿼리를 변경하고 order_details(product_id)에 인덱스를 생성했습니다.

다음은 관련 부분인 jOOQ 쿼리입니다.

   Result<Record> result = ctx
    .select()
    .from(p)
    .join(d).on(p.PRODUCT_ID.eq(d.PRODUCT_ID))
    .where(p.UNIT_PRICE.gt(50.f))
    .fetch();

이것은 product와 order_details를 결합하여 orders가 50보다 큰 products에 대한 모든 unit_price를 얻습니다.

실행된 SQL을 기록하고 EXPLAIN ANALYZE로 실행했습니다.


prepare query0 as

select "p"."product_id", "p"."product_name", "p"."supplier_id", "p"."category_id", "p"."quantity_per_unit", "p"."unit_price", "p"."units_in_stock", "p"."units_on_order", "p"."reorder_level", "p"."discontinued", "d"."order_id", "d"."product_id", "d"."unit_price", "d"."quantity", "d"."discount" from "public"."products" as "p" join "public"."order_details" as "d" on "p"."product_id" = "d"."product_id" where "p"."unit_price" > $1
;

explain (costs off, analyze) execute query0(50);

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Nested Loop (actual time=6.346..1083.373 rows=197 loops=1)
   ->  Seq Scan on order_details d (actual time=5.293..7.478 rows=2155 loops=1)
   ->  Index Scan using products_pkey on products p (actual time=0.486..0.486 rows=0 loops=2155)
         Index Cond: (product_id = d.product_id)
         Filter: (unit_price > '50'::real)
         Rows Removed by Filter: 1
 Planning Time: 0.209 ms
 Execution Time: 1083.488 ms
 Peak Memory Usage: 24 kB
(9 rows)

200개의 행을 반환하는 데 1초가 걸립니다. 이전 게시물과 git 문제에서 설명한 이유는 간단합니다. 모두 order_details 를 읽은 다음 loops=2155 에서 products 를 읽습니다.

데이터베이스에서 최적화가 완료될 때까지 먼저 products 의 목록을 가져온 다음 order_details 에서 읽을 때 WHERE 절에서 이 목록을 사용하여 이를 개선할 수 있습니다.

jOOQ에서는 정말 간단합니다. 동일한 WHERE 절을 사용하여 하위 쿼리를 추가했습니다.

   Result<Record> result = ctx
    .select()
    .from(p)
    .join(d).on(p.PRODUCT_ID.eq(d.PRODUCT_ID))
    .where(p.UNIT_PRICE.gt(50.f))
// workaround: add a subquery returning a list of PRODUCT_ID:
    .and( d.PRODUCT_ID.in ( ctx.select(p.PRODUCT_ID).from(p)
    .where(p.UNIT_PRICE.gt(50.f))
    .fetch(p.PRODUCT_ID) ) )
// this workaround will not be needed when https://github.com/yugabyte/yugabyte-db/issues/4903 is solved
    .fetch();

이렇게 하면 성능을 살펴보기 위해 EXPLAIN ANALYZE로 설명하는 두 개의 SQL 문이 생성됩니다. 첫 번째는 목록을 가져옵니다.

prepare query1 as

select "p"."product_id" from "public"."products" as "p" where "p"."unit_price" > $1
; 

explain (costs off, analyze) execute query1(50);

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on products p (actual time=0.886..2.605 rows=7 loops=1)
   Filter: (unit_price > '50'::real)
   Rows Removed by Filter: 70
 Planning Time: 0.074 ms
 Execution Time: 2.648 ms
 Peak Memory Usage: 8 kB
(6 rows)

yb_demo_northwind=# execute query1(50);
 product_id
------------
         29
         20
         51
          9
         18
         38
         59
(7 rows)

목록을 가져오는 데 4밀리초가 걸리고 jOOQ가 두 번째 쿼리에서 직접 사용하기 때문에 여기서는 서식을 지정할 필요가 없습니다.

prepare query2 as

select "p"."product_id", "p"."product_name", "p"."supplier_id", "p"."category_id", "p"."quantity_per_unit", "p"."unit_price", "p"."units_in_stock", "p"."units_on_order", "p"."reorder_level", "p"."discontinued", "d"."order_id", "d"."product_id", "d"."unit_price", "d"."quantity", "d"."discount" from "public"."products" as "p" join "public"."order_details" as "d" on "p"."product_id" = "d"."product_id" where ("p"."unit_price" > $1 and "d"."product_id" in ($2, $3, $4, $5, $6, $7, $8))
;

explain (costs off, analyze) execute query2(50
, '29','20','51','9','18','38','59'
);

                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=3.458..110.218 rows=197 loops=1)
   ->  Index Scan using order_details_product_id on order_details d (actual time=2.784..9.069 rows=197 loops=1)
         Index Cond: (product_id = ANY ('{29,20,51,9,18,38,59}'::smallint[]))
   ->  Index Scan using products_pkey on products p (actual time=0.498..0.498 rows=1 loops=197)
         Index Cond: (product_id = d.product_id)
         Filter: (unit_price > '50'::real)
 Planning Time: 37.526 ms
 Execution Time: 110.361 ms
 Peak Memory Usage: 1088 kB
(9 rows)

이제 추가 조건자 덕분에 products에 대한 필터링이 조인의 양쪽에서 수행되므로 루프 수가 줄었습니다. 이에 따라 응답 시간이 줄어듭니다.

여기서는 데이터베이스가 작기 때문에 중첩 루프가 여전히 사용되었지만 이 쿼리를 사용하면 해시 조인으로 효율적으로 전환할 수 있습니다.

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Hash Join (actual time=12.914..12.955 rows=197 loops=1)
   Hash Cond: (p.product_id = d.product_id)
   ->  Seq Scan on products p (actual time=1.961..1.968 rows=7 loops=1)
         Remote Filter: (unit_price > '50'::real)
   ->  Hash (actual time=10.934..10.934 rows=197 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 18kB
         ->  Index Scan using order_details_product_id on order_details d (actual time=2.628..10.891 rows=197 loops=1)
               Index Cond: (product_id = ANY ('{29,20,51,9,18,38,59}'::smallint[]))
 Planning Time: 0.267 ms
 Execution Time: 13.031 ms
 Peak Memory Usage: 73 kB
(11 rows)

술어의 푸시다운 덕분에 각 테이블 액세스가 효율적이며 스캔 중에 미리 필터링됩니다. 그런 다음 중첩 루프 대신 하나의 해시 조인을 사용하면 스토리지 노드에 대한 원격 호출이 최소로 줄어듭니다.

이것은 해결 방법이며 아래로 밀려나는 목록의 값 수를 정확히 파악하고 테스트해야 합니다.

Reference

이 문제에 관하여(jOOQ를 사용하여 YugabyteDB에서 Nested Loop 조인 최적화), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/yugabyte/optimizing-nested-loop-joins-on-yugabytedb-with-jooq-keg

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

jOOQ를 사용하여 YugabyteDB에서 Nested Loop 조인 최적화

Reference

좋은 웹페이지 즐겨찾기