> Reading around, seems many people are still using this 2005 snippet to obtain the
> row count estimate from Explain:
I recommend using FORMAT JSON and extracting the top row count from that. It is
simpler and less error-prone.
> Is this still the current best practice? Any tips to increase precision?
> Currently it can estimate the actual number of rows for over or under a million,
> as seen on the sample queries (1,955,297 instead of 1,001,200; 162,080 instead
> of 1,292,010).
Looking at the samples you provided, I get the impression that the statistics for
the table are quite outdated. That will affect the estimates. Try running ANALYZE
and see if that improves the estimates.
effective_cache_size = 6GB # 0.5-0.75 RAM (free -h: free + cache + shared_buffers)
work_mem = 128MB # RAM * 0.25 / max_connections.
maintenance_work_mem = 512MB
default_statistics_target = 500 # def 100, higher to make planner use indexes in big warehouse tables.
random_page_cost = 1.1 # Random reads in SSD cost almost as little as sequential ones
"items": scanned 995023 of 995023 pages, containing 7865043 live rows and 0 dead rows; 1500000 rows in sample, 7865043 estimated total rows
but same deal:
-- After config pimp 1,959,657 instead of 1,001,200 45,341.654 ms
Gather (cost=247031.70..1479393.82 rows=1959657 width=824) (actual time=8464.691..45257.435 rows=1001200 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=246031.70..1282428.12 rows=816524 width=824) (actual time=8413.057..44614.153 rows=333733 loops=3)
Hash Cond: (pricescope_items.tender_transaction_id = pricescope_tenders.transaction_id)
-> Parallel Seq Scan on pricescope_items (cost=0.00..1027794.01 rows=3277101 width=522) (actual time=0.753..41654.507 rows=2621681 loops=3)
-> Parallel Hash (cost=241080.20..241080.20 rows=396120 width=302) (actual time=995.247..995.250 rows=333733 loops=3)
Buckets: 1048576 Batches: 1 Memory Usage: 219904kB
-> Parallel Bitmap Heap Scan on pricescope_tenders (cost=17516.10..241080.20 rows=396120 width=302) (actual time=162.898..321.472 rows=333733 loops=3)
Recheck Cond: ((country = 'Colombia'::text) AND (date >= '2023-01-01'::date) AND (date < '2024-01-01'::date))
Heap Blocks: exact=34722
-> Bitmap Index Scan on pricescope_tenders_country_and_date_index (cost=0.00..17278.43 rows=950688 width=0) (actual time=186.536..186.537 rows=1001200 loops=1)
Index Cond: ((country = 'Colombia'::text) AND (date >= '2023-01-01'::date) AND (date < '2024-01-01'::date))
Planning Time: 11.310 ms
JIT:
Functions: 33
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 8.608 ms, Inlining 213.375 ms, Optimization 557.351 ms, Emission 417.568 ms, Total 1196.902 ms
Execution Time: 45341.654 ms
-- Subselect country to hide constant from planner, so it doesn't use statistics
Explain Analyze
Select * from pricescope_tenders inner join pricescope_items on transaction_id = tender_transaction_id
where country = (select 'Colombia')
and "date" >= '2023-01-01' and "date" < '2024-01-01'
;
-- Colombia in subselect 428,623 instead of 1,001,200 6674.860 ms
Gather (cost=1001.00..570980.73 rows=428623 width=824) (actual time=166.785..6600.673 rows=1001200 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=166.031..166.033 rows=1 loops=1)
-> Nested Loop (cost=0.99..527118.42 rows=178593 width=824) (actual time=200.511..5921.585 rows=333733 loops=3)
-> Parallel Index Scan using pricescope_tenders_country_and_date_index on pricescope_tenders (cost=0.43..104391.64 rows=86641 width=302) (actual time=200.388..400.882 rows=333733 loops=3)
Index Cond: ((country = $0) AND (date >= '2023-01-01'::date) AND (date < '2024-01-01'::date))
-> Index Scan using pricescope_items_tender_transaction_id_index on pricescope_items (cost=0.56..4.83 rows=5 width=522) (actual time=0.016..0.016 rows=1 loops=1001200)
Index Cond: (tender_transaction_id = pricescope_tenders.transaction_id)
Planning Time: 7.372 ms
JIT:
Functions: 31
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 6.981 ms, Inlining 209.470 ms, Optimization 308.123 ms, Emission 248.176 ms, Total 772.750 ms
Execution Time: 6674.860 ms