Hi all, I'm trying to come up with an efficient query, ideally without having to coerce the planner into doing what I want, but I'm running up against a problem with row estimates, and I'm curious if there's a better way to address the problem than what I've come up with. Here's a straightforward version of the query: SELECT v.patient_class, count( DISTINCT v.id ) AS num_visits FROM patient_visits AS v JOIN patients AS p ON p.id = v.patient_id JOIN members AS m ON m.patient_id = p.id JOIN user_populations AS up ON up.population_id = m.population_id JOIN users AS u ON u.id = up.user_id JOIN sponsor_populations AS sp ON sp.population_id = m.population_id AND sp.sponsor_id = u.sponsor_id WHERE u.id = 3962 AND v.start_on < ( m.last_activity_on + sp.authz_expiration_interval ) AND v.end_on IS NULL GROUP BY v.patient_class; The plan looks like this: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=5666.32..5666.42 rows=5 width=12) (actual time=22239.478..22244.339 rows=5 loops=1) Group Key: v.patient_class Buffers: shared hit=20466857 -> Sort (cost=5666.32..5666.34 rows=7 width=12) (actual time=22236.049..22239.030 rows=50988 loops=1) Sort Key: v.patient_class, v.id Sort Method: quicksort Memory: 3528kB Buffers: shared hit=20466857 -> Nested Loop (cost=2.94..5666.22 rows=7 width=12) (actual time=1.759..22203.441 rows=50988 loops=1) Join Filter: (v.patient_id = p.id) Buffers: shared hit=20466857 -> Nested Loop (cost=2.37..5659.62 rows=11 width=28) (actual time=1.743..21892.354 rows=50988 loops=1) Join Filter: (v.start_on < (m.last_activity_on + sp.authz_expiration_interval)) Rows Removed by Join Filter: 19368 Buffers: shared hit=20204287 -> Nested Loop (cost=1.80..3616.14 rows=2285 width=28) (actual time=0.065..3322.440 rows=2968729 loops=1) Join Filter: (m.population_id = up.population_id) Buffers: shared hit=2053968 -> Nested Loop (cost=1.11..11.73 rows=1 width=73) (actual time=0.041..0.286 rows=9 loops=1) Join Filter: (u.sponsor_id = sp.sponsor_id) Buffers: shared hit=67 -> Nested Loop (cost=0.70..9.09 rows=1 width=95) (actual time=0.028..0.164 rows=9 loops=1) Buffers: shared hit=31 -> Index Only Scan using index_user_populations_on_user_id_and_population_id on user_populations up (cost=0.42..1.57 rows=3 width=36) (actual time=0.015..0.025 rows=9 loops=1) Index Cond: (user_id = 3962) Heap Fetches: 0 Buffers: shared hit=4 -> Index Scan using index_sponsor_populations_on_population_id_and_sponsor_id on sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual time=0.011..0.012 rows=1 loops=9) Index Cond: (population_id = up.population_id) Buffers: shared hit=27 -> Index Scan using users_pkey on users u (cost=0.41..2.63 rows=1 width=24) (actual time=0.011..0.011 rows=1 loops=9) Index Cond: (id = 3962) Buffers: shared hit=36 -> Index Only Scan using index_members_on_population_id on members m (cost=0.70..3173.12 rows=34503 width=40) (actual time=0.023..324.991 rows=329859 loops=9) Index Cond: (population_id = sp.population_id) Heap Fetches: 2729516 Buffers: shared hit=2053901 -> Index Scan using idx_new_patient_visits_unique on patient_visits v (cost=0.57..0.88 rows=1 width=24) (actual time=0.006..0.006 rows=0 loops=2968729) Index Cond: (patient_id = m.patient_id) Filter: (end_on IS NULL) Rows Removed by Filter: 4 Buffers: shared hit=18150319 -> Index Only Scan using patients_pkey on patients p (cost=0.57..0.59 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=50988) Index Cond: (id = m.patient_id) Heap Fetches: 12947 Buffers: shared hit=262570 Settings: effective_cache_size = '372GB', max_parallel_workers = '12', max_parallel_workers_per_gather = '4', work_mem = '164MB', random_page_cost = '1.1', effective_io_concurrency = '200' Planning: Buffers: shared hit=76 Planning Time: 68.881 ms Execution Time: 22244.426 ms (50 rows) ===== Now, this plan isn't bad, despite the order-of-magnitude underestimation of how many rows it will need to look at in index_members_on_population_id. But I'm still hoping to get a faster query. There are, broadly speaking, two constraints on which patient_visits rows will be counted: - The end_on must be NULL, indicating that the visit is still in progress. - The visit must be accessible by the user, meaning: - The visit's patient must have a member from at least one of the user's populations, such that the visit starts before the user's access to that member expires. Unfortunately, neither of these conditions is independently super selective. There are a lot of visits (many that the user cannot access) where end_on is NULL (1,212,480) and even more visits (many already ended) that the user can access (4,167,864). However, there are only 25,334 rows that meet both conditions. I don't see any way of taking advantage of this fact without denormalizing some of the authorization-related data into the patient_visits table. So the approach I'm considering is to add a column, patient_visits.cached_population_ids, which is an array of the visit's patient's member's population_ids. It's a necessary (though insufficient) condition of a user being able to access a patient_visits row that this array have a non-empty intersection with an array of the user's own population_ids (drawn from the user_populations table). This array column has an index: "index_new_patient_visits_on_open_cached_population_ids" gin (cached_population_ids) WHERE end_on IS NULL So the new query looks like this: with user_member_expirations as ( select u.id user_id, m.id member_id, m.patient_id, (m.last_activity_on + sp.authz_expiration_interval) expires_on from members m join sponsor_populations sp on sp.population_id = m.population_id join user_populations up on up.population_id = m.population_id join users u on u.id = up.user_id where sp.sponsor_id = u.sponsor_id ) select v.patient_class, count(distinct v.id) from new_patient_visits v join user_member_expirations u on u.patient_id = v.patient_id where u.user_id = 3962 and v.end_on is null and v.cached_population_ids && ARRAY['foo-population','bar-population','baz-population','quux-population'] and v.start_on < u.expires_on group by v.patient_class; [Note: I've replaced the actual population ids with fake ones, since they are textual and name actual facilities.] The plan, however, is disappointing: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=5220.41..5220.43 rows=1 width=12) (actual time=21341.629..21346.488 rows=5 loops=1) Group Key: v.patient_class Buffers: shared hit=17585790 -> Sort (cost=5220.41..5220.41 rows=1 width=12) (actual time=21338.148..21341.128 rows=50988 loops=1) Sort Key: v.patient_class, v.id Sort Method: quicksort Memory: 3528kB Buffers: shared hit=17585790 -> Nested Loop (cost=2.37..5220.40 rows=1 width=12) (actual time=1.385..21299.416 rows=50988 loops=1) Join Filter: (v.start_on < (m.last_activity_on + sp.authz_expiration_interval)) Rows Removed by Join Filter: 19368 Buffers: shared hit=17585790 -> Nested Loop (cost=1.80..3529.88 rows=395 width=28) (actual time=0.048..2876.926 rows=2968729 loops=1) Buffers: shared hit=2053968 -> Nested Loop (cost=1.11..11.73 rows=1 width=73) (actual time=0.028..0.426 rows=9 loops=1) Join Filter: (u.sponsor_id = sp.sponsor_id) Buffers: shared hit=67 -> Nested Loop (cost=0.70..9.09 rows=1 width=95) (actual time=0.019..0.262 rows=9 loops=1) Buffers: shared hit=31 -> Index Only Scan using index_user_populations_on_user_id_and_population_id on user_populations up (cost=0.42..1.57 rows=3 width=36) (actual time=0.010..0.047 rows=9 loops=1) Index Cond: (user_id = 3962) Heap Fetches: 0 Buffers: shared hit=4 -> Index Scan using index_sponsor_populations_on_population_id_and_sponsor_id on sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual time=0.011..0.016 rows=1 loops=9) Index Cond: (population_id = up.population_id) Buffers: shared hit=27 -> Index Scan using users_pkey on users u (cost=0.41..2.63 rows=1 width=24) (actual time=0.013..0.013 rows=1 loops=9) Index Cond: (id = 3962) Buffers: shared hit=36 -> Index Only Scan using index_members_on_population_id on members m (cost=0.70..3173.12 rows=34503 width=40) (actual time=0.021..288.933 rows=329859 loops=9) Index Cond: (population_id = sp.population_id) Heap Fetches: 2729516 Buffers: shared hit=2053901 -> Index Scan using index_new_patient_visits_on_patient_id on new_patient_visits v (cost=0.57..4.26 rows=1 width=24) (actual time=0.006..0.006 rows=0 loops=2968729) Index Cond: (patient_id = m.patient_id) Filter: ((end_on IS NULL) AND (cached_population_ids && '{foo-population,bar-population,baz-population,quux-population}'::text[])) Rows Removed by Filter: 4 Buffers: shared hit=15531822 Settings: effective_cache_size = '372GB', max_parallel_workers = '12', max_parallel_workers_per_gather = '4', work_mem = '164MB', random_page_cost = '1.1', effective_io_concurrency = '200' Planning: Buffers: shared hit=44 Planning Time: 68.692 ms Execution Time: 21346.567 ms (42 rows) === Despite the fact that there are only 29,751 patient_visits rows where end_on is null and the array intersection is non-empty, whereas there are (as mentioned previously) over 4M rows that meet the full authz condition, the query planner is starting with the larger set, and I think this is because of the misestimation of how many members rows are involved. (BTW, the tables are all freshly analyzed.) If I explicitly materialize the relevant visits, I get a much better plan: db=# explain (analyze, buffers, settings) with user_member_expirations as ( select u.id user_id, m.id member_id, m.patient_id, (m.last_activity_on + sp.authz_expiration_interval) expires_on from members m join sponsor_populations sp on sp.population_id = m.population_id join user_populations up on up.population_id = m.population_id join users u on u.id = up.user_id where sp.sponsor_id = u.sponsor_id ), visits as materialized ( select * from new_patient_visits v where end_on is null and v.cached_population_ids && ARRAY['foo-population','bar-population','baz-population','quux-population'] ) select v.patient_class, count(distinct v.id) from visits v join user_member_expirations u on u.patient_id = v.patient_id where u.user_id = 3962 and v.start_on < u.expires_on group by v.patient_class; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=58665.16..58665.18 rows=1 width=12) (actual time=4310.702..4315.069 rows=5 loops=1) Group Key: v.patient_class Buffers: shared hit=2599817 CTE visits -> Bitmap Heap Scan on new_patient_visits v_1 (cost=55.66..54698.37 rows=49407 width=722) (actual time=14.364..72.664 rows=29751 loops=1) Recheck Cond: ((cached_population_ids && '{foo-population,bar-population,baz-population,quux-population}':::text[]) AND (end_on IS NULL)) Heap Blocks: exact=28442 Buffers: shared hit=28504 -> Bitmap Index Scan on index_new_patient_visits_on_open_cached_population_ids (cost=0.00..43.31 rows=49407 width=0) (actual time=9.569..9.570 rows=29751 loops=1) Index Cond: (cached_population_ids && '{foo-population,bar-population,baz-population,quux-population}':::text[]) Buffers: shared hit=62 -> Sort (cost=3966.79..3966.80 rows=1 width=12) (actual time=4307.634..4309.925 rows=50988 loops=1) Sort Key: v.patient_class, v.id Sort Method: quicksort Memory: 3528kB Buffers: shared hit=2599817 -> Nested Loop (cost=1.11..3966.78 rows=1 width=12) (actual time=14.550..4280.320 rows=50988 loops=1) Join Filter: ((m.population_id = sp.population_id) AND (v.start_on < (m.last_activity_on + sp.authz_expiration_interval))) Rows Removed by Join Filter: 2124510 Buffers: shared hit=2599817 -> Nested Loop (cost=1.11..1493.94 rows=558 width=97) (actual time=14.405..250.756 rows=267759 loops=1) Buffers: shared hit=28571 -> Nested Loop (cost=1.11..11.73 rows=1 width=73) (actual time=0.035..0.327 rows=9 loops=1) Join Filter: (u.sponsor_id = sp.sponsor_id) Buffers: shared hit=67 -> Nested Loop (cost=0.70..9.09 rows=1 width=95) (actual time=0.023..0.211 rows=9 loops=1) Buffers: shared hit=31 -> Index Only Scan using index_user_populations_on_user_id_and_population_id on user_populations up (cost=0.42..1.57 rows=3 width=36) (actual time=0.013..0.036 rows=9 loops=1) Index Cond: (user_id = 3962) Heap Fetches: 0 Buffers: shared hit=4 -> Index Scan using index_sponsor_populations_on_population_id_and_sponsor_id on sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual time=0.014..0.016 rows=1 loops=9) Index Cond: (population_id = up.population_id) Buffers: shared hit=27 -> Index Scan using users_pkey on users u (cost=0.41..2.63 rows=1 width=24) (actual time=0.011..0.011 rows=1 loops=9) Index Cond: (id = 3962) Buffers: shared hit=36 -> CTE Scan on visits v (cost=0.00..988.14 rows=49407 width=24) (actual time=1.597..21.285 rows=29751 loops=9) Buffers: shared hit=28504 -> Index Scan using index_members_on_patient_id on members m (cost=0.00..4.38 rows=3 width=40) (actual time=0.003..0.014 rows=8 loops=267759) Index Cond: (patient_id = v.patient_id) Rows Removed by Index Recheck: 0 Buffers: shared hit=2571246 Settings: effective_cache_size = '372GB', max_parallel_workers = '12', max_parallel_workers_per_gather = '4', work_mem = '164MB', random_page_cost = '1.1', effective_io_concurrency = '200' Planning: Buffers: shared hit=32 Planning Time: 67.924 ms Execution Time: 4320.252 ms (47 rows) === Of course, I'd prefer not to have to materialize this relation explicitly. This particular query, for this particular user, benefits from it, but similar queries or queries for different users may not. I think the root of the problem is that population size (i.e., the number of members in a given population) has a high variance, and then planner is basing its estimates on the average population size (and maybe the average number of populations to which a user has access?), which is not especially useful. Is there anything I can do about this? Would any extended statistics be useful here? Thanks, Jon