Thanks for all the suggestions Stephen. > That explain analyze shows a whole ton of heap fetches. When was the last time a VACUUM was run on this table, to build the visibility map? Without the visibility map being current, an Index-Only Scan, as is happening here, can really suck. I'm using the default 9.6 config, I thought that auto-vacuum was on by default? On 20 April 2017 at 00:48, Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > Greetings, > > * Samuel Williams (space.ship.traveller@xxxxxxxxx) wrote: >> We want the following kinds of query to be fast: > > "kinds of query" isn't helpful, you should be reviewing exactly the > queries you care about because statistics and your exact data set and > what the exact query you're running is will all have an impact. > >> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE >> "user_event"."what" IN ('poll', 'location_change', >> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 >> 01:23:55') AND (latitude > -37.03079375089291 AND latitude < >> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < >> 175.0805140220076); > > Ugh. You should really look at and consider PostGIS whenever you're > working with geospatial data. > >> We have a btree index and it appears to be working. However, it's >> still pretty slow. > > You're finding some 17M records and then reducing that with DISTINCT to > only 114k. Hard to say if it's faster to just brute-force your way > through that with a HashAgg (as your explain analyze shows), or if a > loose index scan would work better (check out how to do one in PG here: > https://wiki.postgresql.org/wiki/Loose_indexscan). > >> EXPLAIN ANALYZE gives the following: >> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 > > That explain analyze shows a whole ton of heap fetches. When was the > last time a VACUUM was run on this table, to build the visibility map? > Without the visibility map being current, an Index-Only Scan, as is > happening here, can really suck. > >> CLUSTER user_event ON index_user_event_for_visits_3 followed by >> analyze... Our data is mostly time series but sometimes we get some >> dumps with historical records. >> >> Perhaps add a BRIN index on created_at > > BRIN would be helpful if you wanted a smaller index. That doesn't seem > to be the case here. > >> I'm wondering if... we can use an index to cache, all user_ids seen on >> a given day. If we change our index to be more discrete, e.g. >> created_at::date, would this help? The set union of user_ids for 365 >> days should be pretty fast? > > Materialized views are often useful, particularly when the results are > (relatively) slow moving. > >> I'm open to any ideas or suggestions, ideally we can keep >> optimisations within the database, rather than adding a layer of >> caching on top. > > If you need to go the materialized view route, I'd definitely recommend > doing that in the database rather than outside it. > > Thanks! > > Stephen -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general