Re: less than 2 sec for response - possible?

trafdev <trafdev@xxxxxxx> · Fri, 1 Jul 2016 19:48:06 -0700

Thanks Tom.

I've created index on aid, date:

create index aaa on stats.feed_sub(aid,date);

and simplified a query (dropped gran as it's equal for all rows anyway):

SELECT
  sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
  sum(stats.feed_sub.c_filt_doubleclick)  AS clicks_on_target,
  sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
  sum(stats.feed_sub.c_filt_fast_click)   AS fast_click,
  sum(stats.feed_sub.c_filt_ip_mismatch)  AS ip_mismatch,
  sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
  sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
  sum(stats.feed_sub.c_filt_ua_mismatch)  AS ua_mismatch,
  sum(stats.feed_sub.c_filt_url_expired)  AS url_expired,
  stats.feed_sub.subid                    AS stats_feed_sub_subid,
  stats.feed_sub.sid                      AS stats_feed_sub_sid
FROM stats.feed_sub
WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
      stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
      stats.feed_sub.aid = 3
GROUP BY
  stats.feed_sub.subid, stats.feed_sub.sid;

All data is in the cache and it still takes almost 5 seconds to complete:

QUERY PLAN
HashAggregate  (cost=792450.42..803727.24 rows=346979 width=86) (actual 
time=4742.145..4882.468 rows=126533 loops=1)
"  Group Key: subid, sid"
  Buffers: shared hit=1350371
  ->  Index Scan using aaa on feed_sub  (cost=0.43..697031.39 
rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1)
        Index Cond: ((aid = 3) AND (date >= '2016-06-01 
00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 
00:00:00'::timestamp without time zone))
        Buffers: shared hit=1350371
Planning time: 0.159 ms
Execution time: 4899.934 ms

It's better, but still is far from "<2 secs" goal.

Any thoughts?

On 07/01/16 18:23, Tom Lane wrote:
trafdev <trafdev@xxxxxxx> writes:
CREATE INDEX ix_feed_sub_date
   ON stats.feed_sub
   USING brin
   (date);

CREATE UNIQUE INDEX ixu_feed_sub
   ON stats.feed_sub
   USING btree
   (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default");

HashAggregate  (cost=901171.72..912354.97 rows=344100 width=86) (actual
time=7207.825..7335.473 rows=126044 loops=1)
"  Group Key: subid, sid"
   Buffers: shared hit=3635804
   ->  Index Scan using ixu_feed_sub on feed_sub  (cost=0.56..806544.38
rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 loops=1)
         Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without
time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time
zone) AND (gran = '1 day'::interval) AND (aid = 3))
         Buffers: shared hit=3635804
Planning time: 0.150 ms
Execution time: 7352.009 ms

Neither of those indexes is terribly well designed for this query.
A btree index on (aid, gran, date) or (gran, aid, date) would work
much better.  See

https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html

You could rearrange the column order in that giant unique index
and get some of the benefit.  But if you're desperate to optimize
this particular query, an index not bearing so many irrelevant columns
would probably be better for it.

An alternative way of thinking would be to create an index with those
three leading columns and then all of the other columns used by this
query as later columns.  That would be an even larger index, but it would
allow an index-only scan, which might be quite a lot faster.  The fact
that you seem to be hitting about one page for each row retrieved says
that the data you need is pretty badly scattered, so constructing an index
that concentrates everything you need into one range of the index might
be the ticket.

Either of these additional-index ideas is going to penalize table
insertions/updates, so keep an eye on that end of the performance
question too.

			regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance