Yeah, Reply-All... Begin forwarded message: > From: David Yeu <david.yeu@xxxxxxxxx> > Subject: Re: Performance on large, append-only tables > Date: February 10, 2012 10:59:04 AM EST > To: Merlin Moncure <mmoncure@xxxxxxxxx> > > On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote: > >> You can probably significantly optimize this. But first, can we see >> some explain analyze for the affected queries? > > Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1) > -> Sort (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1) > Sort Key: id > Sort Method: top-N heapsort Memory: 29kB > -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9242.73 rows=4643 width=1321) (actual time=6.131..319.835 rows=23038 loops=1) > Index Cond: (group_id = ?) > Filter: ((deleted_at IS NULL) AND (id < ?)) > Total runtime: 348.987 ms > > > A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we ended up with: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1) > -> Index Scan Backward using index_lines_on_group_id_and_id on lines (cost=0.00..6416.04 rows=4443 width=1321) (actual time=17.207..109.867 rows=20 loops=1) > Index Cond: ((group_id = ?) AND (id < ?)) > Filter: (deleted_at IS NULL) > Total runtime: 110.039 ms > > > The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole bunch of EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead of `group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be preferable nearly always. > > And for reference, here's the EXPLAIN for the LIMIT, OFFSET query: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER BY id DESC LIMIT 20 OFFSET 60; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1) > -> Sort (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1) > Sort Key: id > Sort Method: top-N heapsort Memory: 43kB > -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9240.40 rows=4646 width=1321) (actual time=0.117..98.905 rows=7999 loops=1) > Index Cond: (group_id = ?) > Filter: (deleted_at IS NULL) > Total runtime: 109.753 ms > > > - Dave > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance