[External] Join queries slow with predicate, limit, and ordering

Aufar Gilbran <aufar.gilbran@xxxxxxxx> · Mon, 2 Dec 2019 20:43:05 +0800

Hello,

I'm trying to figure out how to optimise 3-table (many-to-many relation) joins
with predicate, limit, and ordering, where one of the tables returns at most one
row.

This is the query that I have right now:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.key = 'status'
      AND tag.value = 'SUCCEEDED'
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It runs very slowly when there are many rows matched on entity table
which, in my
case, there are about 90K rows, even though at most there is only one
row returned
by tag.

Limit  (cost=723.39..723.40 rows=1 width=4) (actual
time=6189.015..6189.242 rows=10 loops=1)
  Output: entity.id
  Buffers: shared hit=411886 read=31282
  ->  Sort  (cost=723.39..723.40 rows=1 width=4) (actual
time=6188.999..6189.059 rows=10 loops=1)
        Output: entity.id
        Sort Key: entity.id DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=411886 read=31282
        ->  Nested Loop  (cost=1.28..723.38 rows=1 width=4) (actual
time=0.153..5590.717 rows=89222 loops=1)
              Output: entity.id
              Buffers: shared hit=411886 read=31282
              ->  Nested Loop  (cost=0.86..721.98 rows=3 width=8)
(actual time=0.108..1851.707 rows=89222 loops=1)
                    Output: entity_tag.entity_id
                    Buffers: shared hit=65146 read=20646
                    ->  Index Scan using tag_key_value_key on
public.tag  (cost=0.43..8.45 rows=1 width=4) (actual time=0.043..0.061
rows=1 loops=1)
                          Output: tag.id, tag.key, tag.value, tag.created_at
                          Index Cond: (((tag.key)::text =
'status'::text) AND ((tag.value)::text = 'SUCCEEDED'::text))
                          Buffers: shared hit=1 read=3
                    ->  Index Only Scan using
entity_tag_tag_id_entity_id_idx on public.entity_tag
(cost=0.43..711.53 rows=201 width=16) (actual time=0.035..756.829
rows=89222 loops=1)
                          Output: entity_tag.tag_id, entity_tag.entity_id
                          Index Cond: (entity_tag.tag_id = tag.id)
                          Heap Fetches: 89222
                          Buffers: shared hit=65145 read=20643
              ->  Index Scan using entity_pkey on public.entity
(cost=0.42..0.46 rows=1 width=4) (actual time=0.010..0.017 rows=1
loops=89222)
                    Output: entity.id, entity.entity_id, entity.type,
entity.created_at
                    Index Cond: (entity.id = entity_tag.entity_id)
                    Filter: ((entity.type)::text = 'execution'::text)
                    Buffers: shared hit=346740 read=10636
Planning time: 0.817 ms
Execution time: 6189.419 ms

Both tag_key_value_key and entity_tag_tag_id_entity_id_idx is a UNIQUE
constraint on tag(key,value) and entity_tag(tag_id, entity_id) respectively.

It seems to me that PostgreSQL runs the nested loop against all of the 90K
records because it wants to sort the result before limiting the result. It
doesn't take into account of the UNIQUE constraint imposed on the table and
thinks that the join being done inside the subquery will change the ordering of
entity_id returned by the subquery, thus prompting the sort.

I believe with how the index sorted, it should be able to just scan the index
backwards because at most only one tag_id will be returned. When I tried
changing the predicate here to filter by ID with the following query:

-- This runs very fast
SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 24
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

and it's blazing fast. This time PostgreSQL seems to know that the join inside
the subqery won't change the ordering of entity_id returned by the subquery, as
seen in the following query explanation:

Limit  (cost=1.28..1025.56 rows=10 width=4) (actual time=0.144..0.276
rows=1 loops=1)
  Output: entity.id
  Buffers: shared hit=12
  ->  Nested Loop  (cost=1.28..1537.70 rows=15 width=4) (actual
time=0.125..0.238 rows=1 loops=1)
        Output: entity.id
        Buffers: shared hit=12
        ->  Nested Loop  (cost=0.86..1529.06 rows=15 width=12) (actual
time=0.057..0.116 rows=1 loops=1)
              Output: entity_tag.tag_id, entity.id
              Buffers: shared hit=8
              ->  Index Only Scan Backward using
entity_tag_tag_id_entity_id_idx on public.entity_tag
(cost=0.43..454.82 rows=128 width=16) (actual time=0.018..0.038 rows=1
loops=1)
                    Output: entity_tag.tag_id, entity_tag.entity_id
                    Index Cond: (entity_tag.tag_id = 24)
                    Heap Fetches: 1
                    Buffers: shared hit=4
              ->  Index Scan using entity_pkey on public.entity
(cost=0.42..8.38 rows=1 width=4) (actual time=0.011..0.030 rows=1
loops=1)
                    Output: entity.id, entity.entity_id, entity.type,
entity.created_at
                    Index Cond: (entity.id = entity_tag.entity_id)
                    Filter: ((entity.type)::text = 'execution'::text)
                    Buffers: shared hit=4
        ->  Materialize  (cost=0.43..8.45 rows=1 width=4) (actual
time=0.040..0.078 rows=1 loops=1)
              Output: tag.id
              Buffers: shared hit=4
              ->  Index Only Scan using tag_pkey on public.tag
(cost=0.43..8.45 rows=1 width=4) (actual time=0.021..0.040 rows=1
loops=1)
                    Output: tag.id
                    Index Cond: (tag.id = 24)
                    Heap Fetches: 1
                    Buffers: shared hit=4
Planning time: 0.362 ms
Execution time: 0.458 ms

which proves that it can just scan the index backward. I can't figure out why
PostgreSQL doesn't take into account the UNIQUE index there.

Is there any way to do this with one query? Or is it because the database
design is flawed?

Environment:
- PostgreSQL version: 9.6
- OS: Linux (Amazon RDS) / MacOSX / Docker

Installation:
- MacOSX: PostgreSQL is installed by using brew.
- Docker: PostgreSQL image from https://hub.docker.com/_/postgres.

I use the default configuration provided in all of these environments.

Thanks!

-- 
Best regards,

Aufar Gilbran

-- 
*_Grab is hiring. Learn more at _**https://grab.careers 
<https://grab.careers/>*

By communicating with Grab Inc and/or its 
subsidiaries, associate companies and jointly controlled entities (“Grab 
Group”), you are deemed to have consented to the processing of your 
personal data as set out in the Privacy Notice which can be viewed at 
https://grab.com/privacy/ <https://grab.com/privacy/>

This email contains 
confidential information and is only for the intended recipient(s). If you 
are not the intended recipient(s), please do not disseminate, distribute or 
copy this email Please notify Grab Group immediately if you have received 
this by mistake and delete this email from your system. Email transmission 
cannot be guaranteed to be secure or error-free as any information therein 
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or 
contain viruses. Grab Group do not accept liability for any errors or 
omissions in the contents of this email arises as a result of email 
transmission. All intellectual property rights in this email and 
attachments therein shall remain vested in Grab Group, unless otherwise 
provided by law.