Re: Index scan cost calculation

Glyn Astill <glynastill@xxxxxxxxxxx> · Thu, 26 Nov 2015 16:34:35 +0000 (UTC)

----- Original Message -----

> From: Glyn Astill <glynastill@xxxxxxxxxxx>
> To: Pgsql-performance <pgsql-performance@xxxxxxxxxxxxxx>
> Sent: Thursday, 26 November 2015, 16:11
> Subject:  Index scan cost calculation
> 
> Hi All,
> 
> Using pg 9.4.5 I'm looking at a table set up by a 3rd party application and 
> trying to figure out why a particular index is being chosen over another for 
> updates/deletes.
> 
> From what I can see the reason is that plans using either index have the same 
> exactly the same cost.  So rather I'm asking if there's something 
> glaringly obvious I'm missing, or is there anything I can to to get better 
> estimates.
> 
> The table is as follows and has  ~ 50M rows, ~ 4.5GB in size:
> 
> CREATE TABLE tickets.seats
> (
>   recnum serial NOT NULL,
>   show numeric(8,0) NOT NULL,
>   type numeric(4,0) NOT NULL,
>   block character varying(8) NOT NULL,
>   "row" numeric(14,0) NOT NULL,
>   seat numeric(8,0) NOT NULL,
>   flag character varying(15) NOT NULL,
>   transno numeric(8,0) NOT NULL,
>   best numeric(4,0) NOT NULL,
>   "user" character varying(15) NOT NULL,
>   "time" numeric(10,0) NOT NULL,
>   date date NOT NULL,
>   date_reserved timestamp NOT NULL
> );
> 
> Indexes:
>   "seats_index01" PRIMARY KEY, btree (show, type, best, block, 
> "row", seat)              // (1094 MB)
>   "seats_index00" UNIQUE, btree (recnum)                              
>                     // (2423 MB)
>   "seats_index02" UNIQUE, btree (show, type, best, block, flag, 
> "row", seat, recnum)      // (2908 MB)

> 

^^ If those first two sizes look wrong, it's because they are; they should be the other way around.

> default_statistics target is 100, and the following columns are non-default:
> 
> attname | attstattarget
> --------+---------------
> show      |          1000
> type       |          1000
> block    |          2000
> row        |          1000
> seat       |          1000
> flag       |          1000
> best       |          1000
> 
> Increasing these further appears to make no noticeable difference. (pg_stats 
> here for these columns here: http://pastebin.com/2WQQec7N)
> 
> An example query below shows that in some cases the seats_index02 index is being 
> chosen:
> 
> # analyze verbose seats;
> INFO:  analyzing "tickets.seats"
> INFO:  "seats": scanned 593409 of 593409 pages, containing 50926456 
> live rows and 349030 dead rows; 600000 rows in sample, 50926456 estimated total 
> rows
> 
> # begin;
> BEGIN
> # explain analyze delete from seats where ("show" = 58919 AND 
> "type" = 1 AND "best" = 10 AND "block" = 
> 'GMA' AND "row" =26 AND "seat" = 15);
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Delete on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.480..0.480 
> rows=0 loops=1)
> ->  Index Scan using seats_index02 on seats  (cost=0.56..4.59 rows=1 width=6) 
> (actual time=0.452..0.453 rows=1 loops=1)
> Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = 
> 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = 
> 26::numeric) AND (seat = 15::numeric))
> Planning time: 2.172 ms
> Execution time: 0.531 ms
> (5 rows)
> 
> But from my naive standpoint, seats_index01 is a better candidate:
> 
> # abort; begin;
> ROLLBACK
> BEGIN
> 
> # update pg_index set indisvalid = false where indexrelid = 
> 'seats_index02'::regclass;
> # explain analyze delete from seats where ("show" = 58919 AND 
> "type" = 1 AND "best" = 10 AND "block" = 
> 'GMA' AND "row" =26 AND "seat" = 15);
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Delete on seats  (cost=0.56..4.59 rows=1 width=6) (actual time=0.103..0.103 
> rows=0 loops=1)
> ->  Index Scan using seats_index01 on seats  (cost=0.56..4.59 rows=1 width=6) 
> (actual time=0.078..0.080 rows=1 loops=1)
> Index Cond: ((show = 58919::numeric) AND (type = 1::numeric) AND (best = 
> 10::numeric) AND ((block)::text = 'GMA'::text) AND ("row" = 
> 26::numeric) AND (seat = 15::numeric))
> Planning time: 0.535 ms
> Execution time: 0.146 ms
> (5 rows)
> 
> 
> In this instance, the time difference is not huge, however in some seemingly 
> random cases where there are a lot of rows with only the "seat" column 
> differing the choice of seats_index02 is much larger ~ 70ms vs 0.something ms 
> with seats_index01
> 
> I suspect some of the seemingly random cases could be where there's been an 
> update, followed by a delete since the last analyze, despite auto analyze 
> running fairly frequently.
> 
> Any suggestions appreciated.
> 
> Thanks
> Glyn
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance