Thanks Tom.
I've created index on aid, date:
create index aaa on stats.feed_sub(aid,date);
and simplified a query (dropped gran as it's equal for all rows anyway):
SELECT
sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target,
sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
sum(stats.feed_sub.c_filt_fast_click) AS fast_click,
sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch,
sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch,
sum(stats.feed_sub.c_filt_url_expired) AS url_expired,
stats.feed_sub.subid AS stats_feed_sub_subid,
stats.feed_sub.sid AS stats_feed_sub_sid
FROM stats.feed_sub
WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
stats.feed_sub.aid = 3
GROUP BY
stats.feed_sub.subid, stats.feed_sub.sid;
All data is in the cache and it still takes almost 5 seconds to complete:
QUERY PLAN
HashAggregate (cost=792450.42..803727.24 rows=346979 width=86) (actual
time=4742.145..4882.468 rows=126533 loops=1)
" Group Key: subid, sid"
Buffers: shared hit=1350371
-> Index Scan using aaa on feed_sub (cost=0.43..697031.39
rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1)
Index Cond: ((aid = 3) AND (date >= '2016-06-01
00:00:00'::timestamp without time zone) AND (date <= '2016-06-30
00:00:00'::timestamp without time zone))
Buffers: shared hit=1350371
Planning time: 0.159 ms
Execution time: 4899.934 ms
It's better, but still is far from "<2 secs" goal.
Any thoughts?
On 07/01/16 18:23, Tom Lane wrote:
trafdev <trafdev@xxxxxxx> writes:
CREATE INDEX ix_feed_sub_date
ON stats.feed_sub
USING brin
(date);
CREATE UNIQUE INDEX ixu_feed_sub
ON stats.feed_sub
USING btree
(date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default");
HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual
time=7207.825..7335.473 rows=126044 loops=1)
" Group Key: subid, sid"
Buffers: shared hit=3635804
-> Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38
rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 loops=1)
Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without
time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time
zone) AND (gran = '1 day'::interval) AND (aid = 3))
Buffers: shared hit=3635804
Planning time: 0.150 ms
Execution time: 7352.009 ms
Neither of those indexes is terribly well designed for this query.
A btree index on (aid, gran, date) or (gran, aid, date) would work
much better. See
https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html
You could rearrange the column order in that giant unique index
and get some of the benefit. But if you're desperate to optimize
this particular query, an index not bearing so many irrelevant columns
would probably be better for it.
An alternative way of thinking would be to create an index with those
three leading columns and then all of the other columns used by this
query as later columns. That would be an even larger index, but it would
allow an index-only scan, which might be quite a lot faster. The fact
that you seem to be hitting about one page for each row retrieved says
that the data you need is pretty badly scattered, so constructing an index
that concentrates everything you need into one range of the index might
be the ticket.
Either of these additional-index ideas is going to penalize table
insertions/updates, so keep an eye on that end of the performance
question too.
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance