-----Original Message----- From: Justin Pryzby <pryzby@xxxxxxxxxxxxx> Sent: 08 January 2019 09:15 To: Abadie Lana <Lana.Abadie@xxxxxxxx> Cc: David Rowley <david.rowley@xxxxxxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx Subject: Re: select query does not pick up the right index On Mon, Jan 07, 2019 at 04:09:50PM +0000, Abadie Lana wrote: > "channel_pkey" PRIMARY KEY, btree (channel_id) > "unique_chname" UNIQUE CONSTRAINT, btree (name) > "channel_name_channel_id_idx" btree (name, channel_id) Note, the third index is more or less redundant. > I would say that when you have a partitioned table, running analyse on the parent table (which includes the children) does not give the same result as running analyse on each individual child table. I don't know if it is an expected behaviour? Right, for relkind='r' inheritence, ANALYZE parent gathers 1) stats for the parent ONLY (stored with pg_stats inherited='f'); and, 2) stats for the parent and its children (stored in pg_stats with inherited='t'). It *doesn't* update statistics for each of the children themselves. Note however that for partitions of relkind='p' tables (available since postgres 10) ANALYZE parent *ALSO* updates stats for the children. > But not the other query...still time-consuming because still using the > wrong index in case of sample_buil_year (but curiously not the BRIN > index) explain (analyze, buffers) select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; Limit (cost=13.40..30.54 rows=5 width=112) (actual time=63411.725..63411.744 rows=3 loops=1) > Buffers: shared hit=38 read=193865 > InitPlan 1 (returns $0) > -> Index Scan using unique_chname on channel (cost=0.41..8.43 rows=1 width=8) (actual time=0.039..0.040 rows=1 loops =1) > Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) > Buffers: shared hit=4 > -> Result (cost=4.96..5294364.58 rows=1544048 width=112) (actual time=63411.723..63411.740 rows=3 loops=1) > Buffers: shared hit=38 read=193865 > -> Merge Append (cost=4.96..5278924.10 rows=1544048 width=80) (actual time=63411.719..63411.735 rows=3 loops=1) > Sort Key: c.smpl_time DESC > Buffers: shared hit=38 read=193865 > -> Index Scan Backward using sample_time_all_idx on sample c (cost=0.12..8.14 rows=1 width=326) (actual time=0.048..0.048 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared hit=5 > -> Index Scan Backward using sample_time_b_idx on sample_buil c_1 (cost=0.42..7775.26 rows=2096 width=320) (actual time=0.008..0.009 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared hit=3 > -> Index Scan Backward using sample_time_c_idx on sample_ctrl c_2 (cost=0.42..77785.57 rows=22441 width=320) (actual time=0.006..0.006 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared hit=3 > -> Index Scan Backward using sample_time_u_idx on sample_util c_3 (cost=0.43..14922.72 rows=3830 width=320) (actual time=0.008..0.008 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared hit=3 > -> Index Scan Backward using sample_time_bm_idx on sample_buil_month c_4 (cost=0.56..2967.10 rows=740 width=74) (actual time=0.011..0.025 rows=3 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared hit=8 > -> Index Scan Backward using sample_time_yb1_idx on sample_buil_year c_5 (cost=0.56..2186210.68 rows=665761 width=75) (actual time=63411.573..63411.574 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared read=193865 I think I see the issue.. Note, this is different than before. Initially the query was slow due to reading the indices for the entire heirarchy, then sorting them, then joining: | -> Index Scan Backward using smpl_time_bx2_idx on sample_buil_year c_5(cost=0.56..1897430.89 rows=50597832 width=328) (actual time=0.068..139840.439 rows=50597834 loops=1) | -> Index Scan Backward using smpl_time_cmx1_idx on sample_ctrl_month c_6 (cost=0.44..55253292.21 rows=18277124 width=85) (actual time=0.061..14610.389 rows=18277123 loops=1) | -> Index Scan Backward using smpl_time_cmx2_idx on sample_ctrl_year c_7 (cost=0.57..2987358.31 rows=79579072 width=76) (actual time=0.067..286316.865 rows=79579075 loops=1) | -> Index Scan Backward using smpl_time_ux1_idx on sample_util_month c_8 (cost=0.57..98830163.45 rows=70980976 width=82) (actual time=0.071..60766.643 rows=70980980 loops=1) | -> Index Scan Backward using smpl_time_ux2_idx on sample_util_year |c_9 (cost=0.57..3070642.94 rows=80637888 width=83) (actual |time=0.069..307091.673 rows=80637891 loops=1) Then you ANALYZEd parent tables and added indices and constraints and started getting bitmap scans, with new query using David's INTERVAL '0 sec': | ... | -> Index Scan Backward using smpl_time_bx2_idx on sample_buil_year c_5 (cost=0.56..2023925.30 rows=3162364 width=320) (actual time=15167.330..15167.330 rows=0 loops=1) | Filter: (channel_id = $0) | Rows Removed by Filter: 50597834 | Buffers: shared hit=25913147 read=713221 | -> Index Scan Backward using sample_time_cm_idx on sample_ctrl_month c_6 (cost=0.56..1862587.12 rows=537562 width=77) (actual time=0.048..0.048 rows=0 loops=1) | Index Cond: (channel_id = $0) | Buffers: shared read=4 | -> Index Scan Backward using smpl_time_cmx2_idx on sample_ctrl_year c_7 (cost=0.57..3186305.67 rows=2094186 width=68) (actual time=25847.549..25847.549 rows=0 loops=1) | Filter: (channel_id = $0) | Rows Removed by Filter: 79579075 | Buffers: shared hit=49868991 read=1121715 | ... I didn't notice this at first, but compare the two slow scans with the fast one. The slow scans have no index condition: they're reading the entire index and FILTERING on channel_id rather than searching the index for it. Now for the "bad" query you're getting: | ... | -> Index Scan Backward using sample_time_bm_idx on sample_buil_month c_4 (cost=0.56..2967.10 rows=740 width=74) (actual time=0.011..0.025 rows=3 loops=1) | Index Cond: (channel_id = $0) | Buffers: shared hit=8 | -> Index Scan Backward using sample_time_yb1_idx on sample_buil_year c_5 (cost=0.56..2186210.68 rows=665761 width=75) (actual time=63411.573..63411.574 rows=0 loops=1) | Index Cond: (channel_id = $0) | Buffers: shared read=193865 | ... This time, the bad scan *is* kind-of searching on channel_id, but reading the entire 1500MB index to do it ... because channel_id is not a leading column: | "sample_time_yb1_idx" btree (smpl_time, channel_id) And I think the explanation is here: > css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 1 DESC; > frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation > ----------+-------------------+------------+-----------+-----------+------------+-------+--------+------------- > 5e-06 | sample_buil_year | smpl_time | f | 0 | -0.999918 | 7 | 10001 | 0.999978 ... > 0.99987 | sample_buil_year | channel_id | f | 0 | 76 | 16 | 60 | 0.207932 The table is highly correlated in its physical storage WRT correlation, and poorly correlated WRT channel_id. Thats matter since it thinks the index will be almost entirely cached, but not the table: | sample_buil_year | sample_time_yb1_idx | 4492 MB | 1522 MB So the planner thinks that reading up to 1500MB index from cache will pay off in ability to read the table sequentially. If it searches the index on channel_id, it would have to read 665761 tuples across a wide fraction of the table (a pages here and a page there), defeating readahead, rather than reading pages clustered/clumped together. On Thu, Jan 03, 2019 at 12:57:27PM +0000, Abadie Lana wrote: > Main parameters : effective_cache_size : 4GB, shared_buffers 4GB, > work_mem 4MB The issue here may just be that you have effective_cache_size=4GB, so planner thinks that sample_time_yb1_idx is likely to be cached. Try decreasing that alot, since it's clearly not cached ? Also, effective_cache_size==shared_buffers is only accurate if you've allocated the server's entire RAM to shared_buffers, which is unreasonable. (Or perhaps if the OS cache is 10x busier with other processes than postgres). I'm not sure why your query plan changed with a brin indx...it wasn't actually used, preferring to scan the original index on channel_id, as you hoped. | -> Bitmap Heap Scan on sample_buil_year c_5 (cost=15416.21..627094.50 rows=665761 width=83) (actual time=0.008..0.008 rows=0 loops=1) | Recheck Cond: (channel_id = t.channel_id) | -> Bitmap Index Scan on sample_time_by_idx (cost=0.00..15249.77 rows=665761 width=0) (actual time=0.007..0.007 rows=0 loops=1) | Index Cond: (channel_id = t.channel_id) Justin Hi, First I'm using postgresql 10.5, so it means that running analyse on sample table was also triggering a analyse on sample children. However as I said, it is not what I observed. Analyze sample is not the same as analyse children tables. Maybe because in that case it is two-level partitioning, i.e. children of children I run the tests once more with all your inputs...but this time I change the postgres settings - but no real success Effective_cache_size=512MB (was 6GB) Shared_buffers=2GB (was 6GB) Work=512MB (was 4MB) original query still expensive (no filter + wrong index) : still I can see some swap activities even though I have plenty of memory.... explain analyze select t.name, c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c, channel t where t.channel_id=c.channel_id and t.name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW' order by c.smpl_time desc limit 5; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------- Limit (cost=5.13..140077.00 rows=5 width=114) (actual time=159467.334..1486064.079 rows=3 loops=1) -> Nested Loop (cost=5.13..269946514.15 rows=9636 width=114) (actual time=159467.332..1486064.073 rows=3 loops=1) Join Filter: (c.channel_id = t.channel_id) Rows Removed by Join Filter: 322099471 -> Merge Append (cost=4.71..265115013.75 rows=322099464 width=89) (actual time=170.874..1205525.136 rows=322099474 loops=1) Sort Key: c.smpl_time DESC -> Index Scan Backward using smpl_time_a_idx on sample c (cost=0.12..8.14 rows=1 width=334) (actual time=0.004..0.004 ro ws=0 loops=1) -> Index Scan Backward using smpl_time_b_idx on sample_buil c_1 (cost=0.42..4059177.39 rows=1033169 width=328) (actual t ime=14.487..13290.596 rows=1033169 loops=1) -> Index Scan Backward using smpl_time_c_idx on sample_ctrl c_2 (cost=0.42..3321314.50 rows=942520 width=328) (actual ti me=12.598..11553.956 rows=942520 loops=1) -> Index Scan Backward using smpl_time_u_idx on sample_util c_3 (cost=0.43..13064997.74 rows=5282177 width=328) (actual time=17.136..33692.383 rows=5282177 loops=1) -> Index Scan Backward using smpl_time_bm_idx on sample_buil_month c_4 (cost=0.43..56507719.34 rows=14768705 width=82) ( actual time=12.616..69994.281 rows=14768705 loops=1) -> Index Scan Backward using smpl_time_by_idx on sample_buil_year c_5 (cost=0.56..1897685.68 rows=50597832 width=84) (ac tual time=33.374..221346.806 rows=50597834 loops=1) -> Index Scan Backward using smpl_time_cm_idx on sample_ctrl_month c_6 (cost=0.44..63167512.05 rows=18277124 width=84) ( actual time=17.823..80242.045 rows=18277123 loops=1) -> Index Scan Backward using smpl_time_cy_idx on sample_ctrl_year c_7 (cost=0.57..2988555.40 rows=79579072 width=84) (ac tual time=18.082..195370.352 rows=79579075 loops=1) -> Index Scan Backward using smpl_time_um_idx on sample_util_month c_8 (cost=0.57..110877026.27 rows=70980976 width=83) (actual time=26.942..184412.358 rows=70980980 loops=1) -> Index Scan Backward using smpl_time_uy_idx on sample_util_year c_9 (cost=0.57..3075812.13 rows=80637888 width=83) (ac tual time=17.794..275571.960 rows=80637891 loops=1) -> Materialize (cost=0.41..8.44 rows=1 width=41) (actual time=0.000..0.000 rows=1 loops=322099474) -> Index Only Scan using channel_name_channel_id_idx on channel t (cost=0.41..8.43 rows=1 width=41) (actual time=15.385. .15.388 rows=1 loops=1) Index Cond: (name = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) Heap Fetches: 1 Planning time: 1.677 ms Execution time: 1486064.165 ms (22 rows) The other query suggested by D.Rowley has the same issue : still swap activity is higher. explain analyze select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW', c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c where c.channel_id in (select channel_id from channel where name ='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------- Limit (cost=5.13..140077.00 rows=5 width=113) (actual time=38582.017..1549136.681 rows=3 loops=1) -> Nested Loop (cost=5.13..269946514.15 rows=9636 width=113) (actual time=38582.014..1549136.674 rows=3 loops=1) Join Filter: (c.channel_id = channel.channel_id) Rows Removed by Join Filter: 322099471 -> Merge Append (cost=4.71..265115013.75 rows=322099464 width=89) (actual time=0.437..1269913.701 rows=322099474 loops=1) Sort Key: c.smpl_time DESC -> Index Scan Backward using smpl_time_a_idx on sample c (cost=0.12..8.14 rows=1 width=334) (actual time=0.006..0.006 rows=0 loops=1) -> Index Scan Backward using smpl_time_b_idx on sample_buil c_1 (cost=0.42..4059177.39 rows=1033169 width=328) (actual time=0.055..702.253 ro ws=1033169 loops=1) -> Index Scan Backward using smpl_time_c_idx on sample_ctrl c_2 (cost=0.42..3321314.50 rows=942520 width=328) (actual time=0.039..684.282 row s=942520 loops=1) -> Index Scan Backward using smpl_time_u_idx on sample_util c_3 (cost=0.43..13064997.74 rows=5282177 width=328) (actual time=0.045..3624.667 rows=5282177 loops=1) -> Index Scan Backward using smpl_time_bm_idx on sample_buil_month c_4 (cost=0.43..56507719.34 rows=14768705 width=82) (actual time=0.039..65 099.797 rows=14768705 loops=1) -> Index Scan Backward using smpl_time_by_idx on sample_buil_year c_5 (cost=0.56..1897685.68 rows=50597832 width=84) (actual time=0.053..1173 26.709 rows=50597834 loops=1) -> Index Scan Backward using smpl_time_cm_idx on sample_ctrl_month c_6 (cost=0.44..63167512.05 rows=18277124 width=84) (actual time=0.037..76 905.550 rows=18277123 loops=1) -> Index Scan Backward using smpl_time_cy_idx on sample_ctrl_year c_7 (cost=0.57..2988555.40 rows=79579072 width=84) (actual time=0.052..4150 67.696 rows=79579075 loops=1) -> Index Scan Backward using smpl_time_um_idx on sample_util_month c_8 (cost=0.57..110877026.27 rows=70980976 width=83) (actual time=0.053..1 41602.620 rows=70980980 loops=1) -> Index Scan Backward using smpl_time_uy_idx on sample_util_year c_9 (cost=0.57..3075812.13 rows=80637888 width=83) (actual time=0.050..3298 99.409 rows=80637891 loops=1) -> Materialize (cost=0.41..8.44 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=322099474) -> Index Only Scan using channel_name_channel_id_idx on channel (cost=0.41..8.43 rows=1 width=8) (actual time=0.102..0.103 rows=1 loops=1) Index Cond: (name = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) Heap Fetches: 1 Planning time: 11.566 ms Execution time: 1549156.273 ms The query which works the best so far - no swap explain analyze select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW', c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c where c.channel_id in (select channel_id from channel where name ='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time + interval '0 sec' desc limit 5; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1865809.94..1865809.95 rows=5 width=121) (actual time=220.854..220.856 rows=3 loops=1) -> Sort (cost=1865809.94..1865834.03 rows=9636 width=121) (actual time=220.852..220.853 rows=3 loops=1) Sort Key: ((c.smpl_time + '00:00:00'::interval)) DESC Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1865649.88 rows=9636 width=121) (actual time=133.087..220.823 rows=3 loops=1) -> Seq Scan on channel (cost=0.00..915.83 rows=1 width=8) (actual time=19.561..21.602 rows=1 loops=1) Filter: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) Rows Removed by Filter: 33425 -> Append (cost=0.00..1849451.58 rows=1525839 width=89) (actual time=113.510..199.202 rows=3 loops=1) -> Seq Scan on sample c (cost=0.00..0.00 rows=1 width=334) (actual time=0.010..0.010 rows=0 loops=1) Filter: (channel.channel_id = channel_id) -> Bitmap Heap Scan on sample_buil c_1 (cost=52.67..5440.29 rows=2096 width=328) (actual time=12.217..12.217 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_time_b_idx (cost=0.00..52.14 rows=2096 width=0) (actual time=12.207..12.208 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_ctrl c_2 (cost=522.34..11512.86 rows=22441 width=328) (actual time=23.037..23.037 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_time_c_idx (cost=0.00..516.73 rows=22441 width=0) (actual time=23.032..23.033 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_util c_3 (cost=89.99..12171.59 rows=3814 width=328) (actual time=52.641..52.642 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_time_u_idx (cost=0.00..89.04 rows=3814 width=0) (actual time=52.636..52.636 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_buil_month c_4 (cost=18.28..2828.85 rows=738 width=82) (actual time=25.584..25.617 rows=3 loops=1) Recheck Cond: (channel_id = channel.channel_id) Heap Blocks: exact=3 -> Bitmap Index Scan on sample_time_bm_idx (cost=0.00..18.09 rows=738 width=0) (actual time=22.164..22.164 rows=3 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_buil_year c_5 (cost=15217.21..626249.52 rows=657115 width=84) (actual time=15.325..15.325 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_by_time_idx (cost=0.00..15052.93 rows=657115 width=0) (actual time=15.319..15.319 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_ctrl_month c_6 (cost=4923.63..222921.32 rows=212525 width=84) (actual time=16.785..16.786 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_time_cm_idx (cost=0.00..4870.50 rows=212525 width=0) (actual time=16.779..16.779 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_ctrl_year c_7 (cost=13853.69..868668.59 rows=598339 width=84) (actual time=21.316..21.316 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_cy_time_idx (cost=0.00..13704.11 rows=598339 width=0) (actual time=21.312..21.312 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Bitmap Heap Scan on sample_util_month c_8 (cost=288.92..45214.06 rows=12433 width=83) (actual time=17.240..17.240 rows=0 loops=1) Recheck Cond: (channel_id = channel.channel_id) -> Bitmap Index Scan on sample_time_um_idx (cost=0.00..285.81 rows=12433 width=0) (actual time=17.235..17.235 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) -> Index Scan using sample_time_uy_idx on sample_util_year c_9 (cost=0.57..54444.50 rows=16337 width=83) (actual time=14.964..14.964 rows=0 loops=1) Index Cond: (channel_id = channel.channel_id) Planning time: 1.976 ms Execution time: 221.009 ms So it seems that the possible solutions (without a schema change on tables) are either to drop the index and use a composite index or to use the trick mentioned by D.Rowley... Thanks Justin and David for your help and time, I learnt quite a lot with your feedback. Lana