Index scan cost calculation

Glyn Astill <glynastill@xxxxxxxxxxx> · Thu, 26 Nov 2015 16:11:41 +0000 (UTC)

Hi All,

Using pg 9.4.5 I'm looking at a table set up by a 3rd party application and trying to figure out why a particular index is being chosen over another for updates/deletes.

>From what I can see the reason is that plans using either index have the same exactly the same cost.  So rather I'm asking if there's something glaringly obvious I'm missing, or is there anything I can to to get better estimates.

The table is as follows and has  ~ 50M rows, ~ 4.5GB in size:

CREATE TABLE tickets.seats
(
  recnum serial NOT NULL,
  show numeric(8,0) NOT NULL,
  type numeric(4,0) NOT NULL,
  block character varying(8) NOT NULL,
  "row" numeric(14,0) NOT NULL,
  seat numeric(8,0) NOT NULL,
  flag character varying(15) NOT NULL,
  transno numeric(8,0) NOT NULL,
  best numeric(4,0) NOT NULL,
  "user" character varying(15) NOT NULL,
  "time" numeric(10,0) NOT NULL,
  date date NOT NULL,
  date_reserved timestamp NOT NULL
);

Indexes:
  "seats_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat)              // (1094 MB)
  "seats_index00" UNIQUE, btree (recnum)                                                  // (2423 MB)
  "seats_index02" UNIQUE, btree (show, type, best, block, flag, "row", seat, recnum)      // (2908 MB)

default_statistics target is 100, and the following columns are non-default:

attname | attstattarget
--------+---------------
show      |          1000
type       |          1000
block    |          2000
row        |          1000
seat       |          1000
flag       |          1000
best       |          1000

Increasing these further appears to make no noticeable difference. (pg_stats here for these columns here: http://pastebin.com/2WQQec7N)

An example query below shows that in some cases the seats_index02 index is being chosen:

# analyze verbose seats;
INFO:  analyzing "tickets.seats"
INFO:  "seats": scanned 593409 of 593409 pages, containing 50926456 live rows and 349030 dead rows; 600000 rows in sample, 50926456 estimated total rows

# begin;
BEGIN
# explain analyze delete from seats where ("show" = 58919 AND "type" = 1 AND "best" = 10 AND "block" = 'GMA' AND "row" =26 AND "seat" = 15);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.480..0.480 rows=0 loops=1)
->  Index Scan using seats_index02 on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.452..0.453 rows=1 loops=1)
Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = 26::numeric) AND (seat = 15::numeric))
Planning time: 2.172 ms
Execution time: 0.531 ms
(5 rows)

But from my naive standpoint, seats_index01 is a better candidate:

# abort; begin;
ROLLBACK
BEGIN

# update pg_index set indisvalid = false where indexrelid = 'seats_index02'::regclass;
# explain analyze delete from seats where ("show" = 58919 AND "type" = 1 AND "best" = 10 AND "block" = 'GMA' AND "row" =26 AND "seat" = 15);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.103..0.103 rows=0 loops=1)
->  Index Scan using seats_index01 on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.078..0.080 rows=1 loops=1)
Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = 26::numeric) AND (seat = 15::numeric))
Planning time: 0.535 ms
Execution time: 0.146 ms
(5 rows)

In this instance, the time difference is not huge, however in some seemingly random cases where there are a lot of rows with only the "seat" column differing the choice of seats_index02 is much larger ~ 70ms vs 0.something ms with seats_index01

I suspect some of the seemingly random cases could be where there's been an update, followed by a delete since the last analyze, despite auto analyze running fairly frequently.

Any suggestions appreciated.

Thanks
Glyn

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance