Hello,
I'm assuming this topic has probably been bludgeoned to a pulp, but my google-fu can't seem to find a solution.
I have two relatively largish tables that I'm trying to join that result in a slow query.
Hardware:
2014 iMac w/ SSD & i5 processor
Tables:
contacts: 1.14 million rows
permissions: 2.49 million rows
contacts have many permissions
Goal: get first page of contacts (limit 40) a user has access to, sorted by creation date.
For simplicity's sake, I've taken out some of the complexity of how I retrieve the permissions & just got all permissions with id less than 2,100,000 to simulate access of ~151k contacts.
stage4=# EXPLAIN ANALYZE WITH perms AS (
stage4(# SELECT DISTINCT(contact_id) from permissions where id < 2100000
stage4(# ) SELECT
stage4-# contacts.id,
stage4-# contacts.first_name
stage4-# FROM contacts
stage4-# INNER JOIN perms ON perms.contact_id = contacts.id
stage4-# ORDER BY contacts.updated_at desc NULLS LAST LIMIT 40 OFFSET 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=226777.45..226777.55 rows=40 width=20) (actual time=1556.133..1556.143 rows=40 loops=1)
CTE perms
-> HashAggregate (cost=34891.61..35404.25 rows=51264 width=4) (actual time=107.107..151.455 rows=151920 loops=1)
Group Key: permissions.contact_id
-> Bitmap Heap Scan on permissions (cost=3561.50..34416.43 rows=190074 width=4) (actual time=14.839..50.445 rows=192372 loops=1)
Recheck Cond: (id < 2100000)
Heap Blocks: exact=2435
-> Bitmap Index Scan on permissions_pkey (cost=0.00..3513.98 rows=190074 width=0) (actual time=14.496..14.496 rows=192372 loops=1)
Index Cond: (id < 2100000)
-> Sort (cost=191373.19..191501.35 rows=51264 width=20) (actual time=1556.132..1556.137 rows=40 loops=1)
Sort Key: contacts.updated_at DESC NULLS LAST
Sort Method: top-N heapsort Memory: 27kB
-> Hash Join (cost=180911.60..189752.76 rows=51264 width=20) (actual time=1124.969..1532.269 rows=124152 loops=1)
Hash Cond: (perms.contact_id = contacts.id)
-> CTE Scan on perms (cost=0.00..1025.28 rows=51264 width=4) (actual time=107.110..203.330 rows=151920 loops=1)
-> Hash (cost=159891.71..159891.71 rows=1144871 width=20) (actual time=1017.354..1017.354 rows=1145174 loops=1)
Buckets: 65536 Batches: 32 Memory Usage: 2521kB
-> Seq Scan on contacts (cost=0.00..159891.71 rows=1144871 width=20) (actual time=0.035..684.361 rows=1145174 loops=1)
Planning time: 0.222 ms
Execution time: 1561.693 ms
(20 rows)
It is to my understanding that the query requires the entire 150k matched contacts to be joined in order for the Sort to run its operation. I don't see a way around this, however, I can't also see how this can be a unique case where it's acceptable to have a 1.5 second query time? There has to be lots of other companies out there that manage much more data that needs to be sorted for the presentation layer.
Thanks in advance,
-Aldo