Re: Performance on large, append-only tables

David Yeu <david.yeu@xxxxxxxxx> · Fri, 10 Feb 2012 16:19:57 +0000

Yeah, Reply-All...

Begin forwarded message:

> From: David Yeu <david.yeu@xxxxxxxxx>
> Subject: Re:  Performance on large, append-only tables
> Date: February 10, 2012 10:59:04 AM EST
> To: Merlin Moncure <mmoncure@xxxxxxxxx>
> 
> On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote:
> 
>> You can probably significantly optimize this.  But first, can we see
>> some explain analyze for the affected queries?
> 
> Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query:
> 
> 
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0;
>                                                                       QUERY PLAN                                                                        
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1)
>   ->  Sort  (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1)
>         Sort Key: id
>         Sort Method:  top-N heapsort  Memory: 29kB
>         ->  Index Scan using index_lines_on_group_id on lines  (cost=0.00..9242.73 rows=4643 width=1321) (actual time=6.131..319.835 rows=23038 loops=1)
>               Index Cond: (group_id = ?)
>               Filter: ((deleted_at IS NULL) AND (id < ?))
> Total runtime: 348.987 ms
> 
> 
> A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we ended up with:
> 
> 
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0;
>                                                                            QUERY PLAN                                                                            
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1)
>   ->  Index Scan Backward using index_lines_on_group_id_and_id on lines  (cost=0.00..6416.04 rows=4443 width=1321) (actual time=17.207..109.867 rows=20 loops=1)
>         Index Cond: ((group_id = ?) AND (id < ?))
>         Filter: (deleted_at IS NULL)
> Total runtime: 110.039 ms
> 
> 
> The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole bunch of EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead of `group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be preferable nearly always.
> 
> And for reference, here's the EXPLAIN for the LIMIT, OFFSET query:
> 
> 
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER BY id DESC LIMIT 20 OFFSET 60;
>                                                                      QUERY PLAN                                                                       
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1)
>   ->  Sort  (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1)
>         Sort Key: id
>         Sort Method:  top-N heapsort  Memory: 43kB
>         ->  Index Scan using index_lines_on_group_id on lines  (cost=0.00..9240.40 rows=4646 width=1321) (actual time=0.117..98.905 rows=7999 loops=1)
>               Index Cond: (group_id = ?)
>               Filter: (deleted_at IS NULL)
> Total runtime: 109.753 ms
> 
> 
> - Dave
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance