-----Original Message----- From: Justin Pryzby <pryzby@xxxxxxxxxxxxx> Sent: 05 January 2019 05:24 To: Abadie Lana <Lana.Abadie@xxxxxxxx> Cc: David Rowley <david.rowley@xxxxxxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx Subject: Re: select query does not pick up the right index On Fri, Jan 04, 2019 at 08:58:57AM +0000, Abadie Lana wrote: > SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, > tablename, attname, null_frac, n_distinct, > array_length(most_common_vals,1) n_mcv, > array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE > attname='...' AND tablename='...' ORDER BY 1 DESC; > > Hmm. Is it normal that the couple (tablename,attname ) is not unique? > I'm surprised to see sample_{ctrl,util,buil} quoted twice One of the rows is for "inherited stats" (including child tables) stats and one is "noninherited stats". The unique index on the table behind that view is: "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit) On the wiki, I added inherited and correlation columns. Would you rerun that query ? https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram /*********************REPLY**********************************************************/ css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='smpl_time' AND tablename like 'sample%' ORDER BY 1 DESC; frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation -------------+-------------------+-----------+-----------+-----------+------------+-------+--------+------------- 0.124457 | sample_buil | smpl_time | f | 0 | -0.752503 | 10000 | 10001 | 0.0802559 0.100454 | sample_util | smpl_time | f | 0 | -0.323349 | 10000 | 10001 | 0.614187 0.0393624 | sample_buil_month | smpl_time | f | 0 | -0.617567 | 10000 | 10001 | 0.181361 0.0305711 | sample_util_month | smpl_time | f | 0 | -0.169437 | 10000 | 10001 | 0.781718 0.0194441 | sample_util_year | smpl_time | f | 0 | -0.428909 | 10000 | 10001 | 0.999893 0.0172493 | sample_util | smpl_time | t | 0 | -0.179957 | 10000 | 10001 | -0.563603 0.0117653 | sample | smpl_time | t | 0 | -0.235397 | 10000 | 10001 | 0.0880253 0.0116284 | sample_buil | smpl_time | t | 0 | -0.743071 | 10000 | 10001 | -0.100979 2.66667e-05 | sample_ctrl_month | smpl_time | f | 0 | -0.999848 | 32 | 10001 | -0.356626 8.48788e-06 | sample_ctrl | smpl_time | f | 0 | -0.999996 | 4 | 10001 | 0.331492 6.33333e-06 | sample_ctrl_year | smpl_time | f | 0 | -0.999835 | 9 | 10001 | 0.999971 5.33333e-06 | sample_ctrl | smpl_time | t | 0 | -0.999827 | 8 | 10001 | 0.0492292 5e-06 | sample_buil_year | smpl_time | f | 0 | -0.999918 | 7 | 10001 | 0.999978 (13 rows) css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 1 DESC; frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation ----------+-------------------+------------+-----------+-----------+------------+-------+--------+------------- 0.99987 | sample_buil_year | channel_id | f | 0 | 76 | 16 | 60 | 0.207932 0.999632 | sample_ctrl_year | channel_id | f | 0 | 132 | 31 | 101 | 0.201352 0.999628 | sample_ctrl_month | channel_id | f | 0 | 84 | 23 | 61 | 0.104656 0.999627 | sample_ctrl | channel_id | t | 0 | 132 | 31 | 101 | 0.143691 0.999599 | sample_ctrl | channel_id | f | 0 | 42 | 22 | 20 | 0.0874279 0.998074 | sample_buil | channel_id | f | 0 | 493 | 122 | 371 | 0.0206452 0.997693 | sample_util | channel_id | f | 0 | 1379 | 509 | 870 | 0.079591 0.991841 | sample_buil | channel_id | t | 0 | 9867 | 107 | 9740 | 0.00540782 0.991567 | sample_util_month | channel_id | f | 0 | 5716 | 504 | 5209 | 0.216868 0.990369 | sample_util_year | channel_id | f | 0 | 4946 | 255 | 4689 | 0.547934 0.990062 | sample_util | channel_id | t | 0 | 5804 | 641 | 5160 | -0.31778 0.972386 | sample_buil_month | channel_id | f | 0 | 19946 | 148 | 10001 | 0.0932767 0.967391 | sample | channel_id | t | 0 | 7597 | 409 | 7178 | 0.501865 (13 rows) css_archive_3_0_0= /**********************ENDREPLY************************************************/ I'm also interested to see \d and channel_id statistics for the channel table. /***********************REPLY***********************************************/ \d channel Table "public.channel" Column | Type | Collation | Nullable | Default --------------+------------------------+-----------+----------+----------------------------------- channel_id | bigint | | not null | nextval('channel_chid'::regclass) name | character varying(100) | | not null | descr | character varying(100) | | | grp_id | bigint | | | smpl_mode_id | bigint | | | smpl_val | double precision | | | smpl_per | double precision | | | retent_id | bigint | | | 1 retent_val | double precision | | | Indexes: "channel_pkey" PRIMARY KEY, btree (channel_id) "unique_chname" UNIQUE CONSTRAINT, btree (name) "channel_name_channel_id_idx" btree (name, channel_id) SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname in ('name','channel_id') AND tablename ='channel' ORDER BY 1 DESC; frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation ----------+-----------+------------+-----------+-----------+------------+-------+--------+------------- | channel | channel_id | f | 0 | -1 | | 10001 | 0.0200338 | channel | name | f | 0 | -1 | | 10001 | -0.257645 /*********************ENDREPLY****************************************************************/ > explain (analyze, buffers) select > 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c > .num_val,c.str_val,c.datatype,c.array_val from sample c WHERE > c.channel_id = (SELECT channel_id FROM channel WHERE > name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc > limit 5; You originally wrote this as a implicit comma join. Does the original query still have an issue ? The =(subselect query) doesn't allow the planner to optimize for the given channel, which seems to be a fundamental problem. /****************************REPLY***************************************************/ Yes the original query still picks up the wrong index. This query actually was suggested by David Rowley and actually with this one the planner is taking the wring index for only sample_ctrl_year and sample_buil_year tables. With some proper analyse, now only sample_ctrl_year. /*****************************ENDREPLY**************************************************/ On Fri, Jan 04, 2019 at 08:58:57AM +0000, Abadie Lana wrote: > Based on your feedback...i rerun analyse directly on the two table > sample_ctrl_year and sample_buil_year [...] Now when running the query again, only for sample_buil_year table the wrong index is picked up... It looks like statistics on your tables were completely wrong; not just sample_ctrl_year and sample_buil_year. Right ? /*****************************REPLY*******************************************************/ I would say that when you have a partitioned table, running analyse on the parent table (which includes the children) does not give the same result as running analyse on each individual child table. I don't know if it is an expected behaviour? /********************************ENDREPLY****************************************************/ Autoanalyze would normally handle this on nonempty tables (children or otherwise) and you should manually run ANALZYE on the parents (both levels of them) whenever statistics change, like after running a big DELETE or DROP or after a significant interval of time has passed relative to the range of time in the table's timestamp columns. Do you know why autoanalze didn't handle the nonempty tables on its own ? /******************************REPLY***************************************************************/ This database has been loaded via a dump. After there was no change in the actual tables'content apart from creating/droping. indexes. So I guess that's why autoanalyze didn't run (also I didn't change the default configuration for this part in postgresql.conf) /*******************************ENDREPLY**********************************************************/ > Now, the channel name I gave has no entries in sample_buil_year...(and when I run the query directly against sample_buil_year the right index is picked up).... So maybe something related with the partitioning? > -> Index Scan Backward using smpl_time_bx2_idx on sample_buil_year c_5 (cost=0.56..2023054.76 rows=665761 width=75) (actual time=13216.589..13216.589 rows=0 loops=1) > Filter: (channel_id = $0) > Rows Removed by Filter: 50597834 > Buffers: shared hit=26626368 So it scanned the entire index expecting to find 5 matching channel IDs "pretty soon", based on the generic distribution of channel IDs, without the benefit of knowing that this channel ID doesn't exist at all (due to =(subquery)). /*********************************REPLY******************************************************/ Exactly it took hearethe wrong index smpl_time_bx2_idx instead of sample_time_by_idx. /*********************************ENDREPLY**************************************************/ 26e6 buffers is 200GB, apparently accessing some pages many times (even if cached). /**********************************REPLY********************************************************/ Yes this is what I observed when running iotop...more than 17GB was read from disk. I'm surprised as I would expect that the max. would be the index size...~7GB. We also get an swap alert...because it uses swap... /********************************ENDREPLY**************************************************/ table_name | index_name | table_size | index_size sample_buil_year | smpl_time_bx2_idx | 4492 MB | 1084 MB General comments: On Wed, Jan 02, 2019 at 04:28:41PM +0000, Abadie Lana wrote: > "sample_time_bm_idx" btree (channel_id, smpl_time) > "sample_time_mb1_idx" btree (smpl_time, channel_id) > "smpl_time_bx1_idx" btree (smpl_time) The smpl_time index is loosely redundant with index on (smpl_time,channel_id). You might consider dropping it, or otherwise dropping the smpl_time,channel_id index and making two separate indices on smpl_time and channel. That would allow bitmap ANDing them together. /******************************REPLY***********************************************************/ Yes I know. The thing is I had to find a quick solution to fix as my application was taking ages - two types of queries (one which requires channeld_id=XX + order by time and another one by time range (all channels between T1 and T2). As the smpl_time_bx1_idx was slowing down the first query, I created sample_time_mb1_idx and drop smpl_time_bx1_idx. Now it has been recreated as I wanted to understand why the planner picked up the wrong indexes. /*****************************ENDREPLY**********************************************************/ Or possibly (depending on detail of your data loading) leaving the composite index and changing smpl_time to a BRIN index - it's nice to be able to CLUSTER on the btree index to maximize the efficiency of the brin index. >Check constraints: > "sample_buil_month_smpl_time_check" CHECK (smpl_time >= (now() - >'32 days'::interval)::timestamp without time zone AND smpl_time <= >now()) I'm surprised that works, and not really sure what it's doing..but in any case it's maybe not doing what you wanted(??). I'm guessing you never get constraint exclusion (which is irrelevant for this query but still). /*********************************REPLY************************************************/ I know that the partitioning is not exclusive in this one. In fact the insert is done at sample_{util/buil/ctrl} table. The data is in this table. Then there are some scripts which moves data from sample -> sample_month and then sample_month-> sample_year. I'm not the owner of this schema...so cannot comment why it has been done like that... And same for indexes. I cannot change them. I did it in that case, because I did a copy of the database and launched the apps on this one (part of annual maintenance activities). I created the BRIN index on smpl_time and now the original query runs fine because it uses the right index, the one on (channel_id,smpl_time) explain analyze select t.name, c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c, channel t where t.channel_id=c.channel_id and t.name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW' order by c.smpl_time desc limit 5; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------- Limit (cost=1869725.53..1869725.54 rows=5 width=113) (actual time=3.898..3.900 rows=3 loops=1) -> Sort (cost=1869725.53..1869749.62 rows=9636 width=113) (actual time=3.896..3.897 rows=3 loops=1) Sort Key: c.smpl_time DESC Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1869565.48 rows=9636 width=113) (actual time=2.270..3.878 rows=3 loops=1) -> Seq Scan on channel t (cost=0.00..915.83 rows=1 width=41) (actual time=2.212..3.773 rows=1 loops=1) Filter: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) Rows Removed by Filter: 33425 -> Append (cost=0.00..1853209.17 rows=1544048 width=88) (actual time=0.053..0.099 rows=3 loops=1) -> Seq Scan on sample c (cost=0.00..0.00 rows=1 width=334) (actual time=0.002..0.002 rows=0 loops=1) Filter: (t.channel_id = channel_id) -> Bitmap Heap Scan on sample_buil c_1 (cost=52.67..5440.29 rows=2096 width=328) (actual time=0.016. .0.016 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_b_idx (cost=0.00..52.14 rows=2096 width=0) (actual time=0. 008..0.008 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_ctrl c_2 (cost=522.34..11512.86 rows=22441 width=328) (actual time=0.0 05..0.006 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_c_idx (cost=0.00..516.73 rows=22441 width=0) (actual time= 0.005..0.005 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_util c_3 (cost=90.11..12215.14 rows=3830 width=328) (actual time=0.009 ..0.009 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_u_idx (cost=0.00..89.16 rows=3830 width=0) (actual time=0. 006..0.006 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_buil_month c_4 (cost=18.29..2836.29 rows=740 width=82) (actual time=0. 017..0.021 rows=3 loops=1) Recheck Cond: (channel_id = t.channel_id) Heap Blocks: exact=3 -> Bitmap Index Scan on sample_time_bm_idx (cost=0.00..18.11 rows=740 width=0) (actual time=0. 012..0.012 rows=3 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_buil_year c_5 (cost=15416.21..627094.50 rows=665761 width=83) (actual time=0.008..0.008 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_by_idx (cost=0.00..15249.77 rows=665761 width=0) (actual t ime=0.007..0.007 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_ctrl_month c_6 (cost=5038.85..223721.75 rows=217585 width=83) (actual time=0.006..0.007 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_cm_idx (cost=0.00..4984.45 rows=217585 width=0) (actual ti me=0.006..0.006 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_ctrl_year c_7 (cost=13960.83..870933.00 rows=602872 width=84) (actual time=0.006..0.006 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_cy_idx (cost=0.00..13810.11 rows=602872 width=0) (actual t ime=0.005..0.015 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Bitmap Heap Scan on sample_util_month c_8 (cost=288.81..45162.12 rows=12418 width=83) (actual tim e=0.008..0.008 rows=0 loops=1) Recheck Cond: (channel_id = t.channel_id) -> Bitmap Index Scan on sample_time_um_idx (cost=0.00..285.70 rows=12418 width=0) (actual time =0.007..0.007 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) -> Index Scan using sample_time_uy_idx on sample_util_year c_9 (cost=0.57..54293.22 rows=16304 width =82) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: (channel_id = t.channel_id) Planning time: 1.752 ms Execution time: 4.004 But not the other query...still time-consuming because still using the wrong index in case of sample_buil_year (but curiously not the BRIN index) explain (analyze, buffers) select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------- Limit (cost=13.40..30.54 rows=5 width=112) (actual time=63411.725..63411.744 rows=3 loops=1) Buffers: shared hit=38 read=193865 InitPlan 1 (returns $0) -> Index Scan using unique_chname on channel (cost=0.41..8.43 rows=1 width=8) (actual time=0.039..0.040 rows=1 loops =1) Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text) Buffers: shared hit=4 -> Result (cost=4.96..5294364.58 rows=1544048 width=112) (actual time=63411.723..63411.740 rows=3 loops=1) Buffers: shared hit=38 read=193865 -> Merge Append (cost=4.96..5278924.10 rows=1544048 width=80) (actual time=63411.719..63411.735 rows=3 loops=1) Sort Key: c.smpl_time DESC Buffers: shared hit=38 read=193865 -> Index Scan Backward using sample_time_all_idx on sample c (cost=0.12..8.14 rows=1 width=326) (actual ti me=0.048..0.048 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=5 -> Index Scan Backward using sample_time_b_idx on sample_buil c_1 (cost=0.42..7775.26 rows=2096 width=320) (actual time=0.008..0.009 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=3 -> Index Scan Backward using sample_time_c_idx on sample_ctrl c_2 (cost=0.42..77785.57 rows=22441 width=32 0) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=3 -> Index Scan Backward using sample_time_u_idx on sample_util c_3 (cost=0.43..14922.72 rows=3830 width=320 ) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=3 -> Index Scan Backward using sample_time_bm_idx on sample_buil_month c_4 (cost=0.56..2967.10 rows=740 widt h=74) (actual time=0.011..0.025 rows=3 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=8 -> Index Scan Backward using sample_time_yb1_idx on sample_buil_year c_5 (cost=0.56..2186210.68 rows=66576 1 width=75) (actual time=63411.573..63411.574 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared read=193865 -> Index Scan Backward using sample_time_cm_idx on sample_ctrl_month c_6 (cost=0.56..759241.36 rows=217585 width=75) (actual time=0.030..0.030 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=4 -> Index Scan Backward using sample_time_cy_idx on sample_ctrl_year c_7 (cost=0.57..2097812.02 rows=602872 width=76) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=4 -> Index Scan Backward using sample_time_um_idx on sample_util_month c_8 (cost=0.57..48401.65 rows=12418 w idth=75) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=4 -> Index Scan Backward using sample_time_uy_idx on sample_util_year c_9 (cost=0.57..54293.22 rows=16304 wi dth=74) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: (channel_id = $0) Buffers: shared hit=4 Planning time: 0.727 ms Execution time: 63411.858 ms (43 rows) \d sample_buil_year Table "public.sample_buil_year" Column | Type | Collation | Nullable | Default -------------+-----------------------------+-----------+----------+------------- channel_id | bigint | | not null | smpl_time | timestamp without time zone | | not null | nanosecs | bigint | | not null | severity_id | bigint | | not null | status_id | bigint | | not null | num_val | integer | | | float_val | double precision | | | str_val | character varying(120) | | | datatype | character(1) | | | ' '::bpchar array_val | bytea | | | Indexes: "sample_time_by_idx" btree (channel_id, smpl_time) "sample_time_yb1_idx" btree (smpl_time, channel_id) "smpl__by_brin_idx" brin (smpl_time) WITH (pages_per_range='128') Check constraints: "sample_buil_year_smpl_time_check" CHECK (smpl_time >= (now() - '1 year 1 mon'::interval)::timestamp without time zone AND smpl_time <= now()) Inherits: sample_buil It works when I dropped the other index sample_time_yb1_idx The BRIN works well with the other query. Thanks for the tip I will look into more details on this BRIN. Thanks for your help /********************************ENDREPLY*********************************************/ Justin