----- Original Message ----- > From: Glyn Astill <glynastill@xxxxxxxxxxx> > To: Pgsql-performance <pgsql-performance@xxxxxxxxxxxxxx> > Sent: Thursday, 26 November 2015, 16:11 > Subject: Index scan cost calculation > > Hi All, > > Using pg 9.4.5 I'm looking at a table set up by a 3rd party application and > trying to figure out why a particular index is being chosen over another for > updates/deletes. > > From what I can see the reason is that plans using either index have the same > exactly the same cost. So rather I'm asking if there's something > glaringly obvious I'm missing, or is there anything I can to to get better > estimates. > > The table is as follows and has ~ 50M rows, ~ 4.5GB in size: > > CREATE TABLE tickets.seats > ( > recnum serial NOT NULL, > show numeric(8,0) NOT NULL, > type numeric(4,0) NOT NULL, > block character varying(8) NOT NULL, > "row" numeric(14,0) NOT NULL, > seat numeric(8,0) NOT NULL, > flag character varying(15) NOT NULL, > transno numeric(8,0) NOT NULL, > best numeric(4,0) NOT NULL, > "user" character varying(15) NOT NULL, > "time" numeric(10,0) NOT NULL, > date date NOT NULL, > date_reserved timestamp NOT NULL > ); > > Indexes: > "seats_index01" PRIMARY KEY, btree (show, type, best, block, > "row", seat) // (1094 MB) > "seats_index00" UNIQUE, btree (recnum) > // (2423 MB) > "seats_index02" UNIQUE, btree (show, type, best, block, flag, > "row", seat, recnum) // (2908 MB) > ^^ If those first two sizes look wrong, it's because they are; they should be the other way around. > default_statistics target is 100, and the following columns are non-default: > > attname | attstattarget > --------+--------------- > show | 1000 > type | 1000 > block | 2000 > row | 1000 > seat | 1000 > flag | 1000 > best | 1000 > > Increasing these further appears to make no noticeable difference. (pg_stats > here for these columns here: http://pastebin.com/2WQQec7N) > > An example query below shows that in some cases the seats_index02 index is being > chosen: > > # analyze verbose seats; > INFO: analyzing "tickets.seats" > INFO: "seats": scanned 593409 of 593409 pages, containing 50926456 > live rows and 349030 dead rows; 600000 rows in sample, 50926456 estimated total > rows > > # begin; > BEGIN > # explain analyze delete from seats where ("show" = 58919 AND > "type" = 1 AND "best" = 10 AND "block" = > 'GMA' AND "row" =26 AND "seat" = 15); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Delete on seats (cost=0.56..4.59 rows=1 width=6) (actual time=0.480..0.480 > rows=0 loops=1) > -> Index Scan using seats_index02 on seats (cost=0.56..4.59 rows=1 width=6) > (actual time=0.452..0.453 rows=1 loops=1) > Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = > 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = > 26::numeric) AND (seat = 15::numeric)) > Planning time: 2.172 ms > Execution time: 0.531 ms > (5 rows) > > But from my naive standpoint, seats_index01 is a better candidate: > > # abort; begin; > ROLLBACK > BEGIN > > # update pg_index set indisvalid = false where indexrelid = > 'seats_index02'::regclass; > # explain analyze delete from seats where ("show" = 58919 AND > "type" = 1 AND "best" = 10 AND "block" = > 'GMA' AND "row" =26 AND "seat" = 15); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Delete on seats (cost=0.56..4.59 rows=1 width=6) (actual time=0.103..0.103 > rows=0 loops=1) > -> Index Scan using seats_index01 on seats (cost=0.56..4.59 rows=1 width=6) > (actual time=0.078..0.080 rows=1 loops=1) > Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = > 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = > 26::numeric) AND (seat = 15::numeric)) > Planning time: 0.535 ms > Execution time: 0.146 ms > (5 rows) > > > In this instance, the time difference is not huge, however in some seemingly > random cases where there are a lot of rows with only the "seat" column > differing the choice of seats_index02 is much larger ~ 70ms vs 0.something ms > with seats_index01 > > I suspect some of the seemingly random cases could be where there's been an > update, followed by a delete since the last analyze, despite auto analyze > running fairly frequently. > > Any suggestions appreciated. > > Thanks > Glyn > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance