Re: Changing ORDER BY column slows query dramatically

brick pglists <brickpglists@xxxxxxxxx> · Fri, 12 Apr 2013 13:38:00 -0700

Hi Shaun,

On Fri, Apr 12, 2013 at 12:59 PM, Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> wrote:
> On 04/12/2013 11:51 AM, brick pglists wrote:
>
> Since it doesn't show up in your GUC list, you should probably increase your
> default_statistics_target to 400 or more, analyze, and try again. The
> heuristics for the dates aren't complete enough, so it thinks there are few
> matches. If that doesn't work and you want a quick, but ugly fix for this,
> you can create the following index:
>
> CREATE INDEX event_20130406_id_desc_tstamp_utc_idx
>     ON event_20130406 (id DESC, tstamp_utc);

Thanks for your suggestions. Bumping up the default_statistics_target
several times all the way to 4000 (ANALYZEd each time) did not help,
however, adding the index you suggested helped with that query. It is
still over a magnitude slower than the version that sorts by
tstamp_utc, but it's a start. I created a similar index (CREATE INDEX
event_20130406_id_desc_tstamp_utc_desc_idx ON event_20130406 (id DESC,
tstamp_utc DESC)) where both columns were sorted DESCm and given the
choice between those two, it chose the latter.
Setting enable_mergejoin to false results in a plan much closer to the
original fast one, and further changing cpu_tuple_cost up to 1 results
in a query about 3x slower than the original fast one.

The ORDER BY e.id query, with the new index, enable_mergejoin
disabled, and cpu_tuple_cost bumped up to 1:

 Limit  (cost=125386.16..126640.02 rows=100 width=42) (actual
time=220.807..221.864 rows=100 loops=1)
   Buffers: shared hit=49171 read=6770
   I/O Timings: read=44.980
   ->  Nested Loop  (cost=0.00..7734858.92 rows=616883 width=42)
(actual time=110.718..213.923 rows=10100 loops=1)
         Buffers: shared hit=49171 read=6770
         I/O Timings: read=44.980
         ->  Index Scan using
event_20130406_id_desc_tstamp_utc_desc_idx on event_20130406 e
(cost=0.00..2503426.81 rows=1851068 width=34) (actual
time=110.690..139.001 rows=10100 loops=1)
               Index Cond: ((tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-06
18:00:00'::timestamp without time zone))
               Filter: ((date_utc = '2013-04-06'::date) AND (org_id = 216471))
               Rows Removed by Filter: 1554
               Buffers: shared hit=8647 read=6770
               I/O Timings: read=44.980
         ->  Index Scan using
notification_counts_by_status_20130406_event_id_org_id_pk on
notification_counts_by_status_20130406 ncbs  (cost=0.00..1.83 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=10100)
               Index Cond: (event_id = e.id)
               Filter: ((event_creation_tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (event_creation_tstamp_utc
<= '2013-04-06 18:00:00'::timestamp without time zone) AND (status =
'DELIVERED'::text))
               Buffers: shared hit=40524
 Total runtime: 222.127 ms
(17 rows)

Still not at the ~90ms from the "ORDER BY e.tstamp_utc DESC" version,
but not too bad.  Now I need to figure out how I can get the best plan
choice without monkeying around with enable_mergejoin and changing
cpu_tuple_cost too much.

If any more suggestions are forthcoming, I am all ears!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance