Greetings, * Samuel Williams (space.ship.traveller@xxxxxxxxx) wrote: > We want the following kinds of query to be fast: "kinds of query" isn't helpful, you should be reviewing exactly the queries you care about because statistics and your exact data set and what the exact query you're running is will all have an impact. > SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE > "user_event"."what" IN ('poll', 'location_change', > 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 > 01:23:55') AND (latitude > -37.03079375089291 AND latitude < > -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < > 175.0805140220076); Ugh. You should really look at and consider PostGIS whenever you're working with geospatial data. > We have a btree index and it appears to be working. However, it's > still pretty slow. You're finding some 17M records and then reducing that with DISTINCT to only 114k. Hard to say if it's faster to just brute-force your way through that with a HashAgg (as your explain analyze shows), or if a loose index scan would work better (check out how to do one in PG here: https://wiki.postgresql.org/wiki/Loose_indexscan). > EXPLAIN ANALYZE gives the following: > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 That explain analyze shows a whole ton of heap fetches. When was the last time a VACUUM was run on this table, to build the visibility map? Without the visibility map being current, an Index-Only Scan, as is happening here, can really suck. > CLUSTER user_event ON index_user_event_for_visits_3 followed by > analyze... Our data is mostly time series but sometimes we get some > dumps with historical records. > > Perhaps add a BRIN index on created_at BRIN would be helpful if you wanted a smaller index. That doesn't seem to be the case here. > I'm wondering if... we can use an index to cache, all user_ids seen on > a given day. If we change our index to be more discrete, e.g. > created_at::date, would this help? The set union of user_ids for 365 > days should be pretty fast? Materialized views are often useful, particularly when the results are (relatively) slow moving. > I'm open to any ideas or suggestions, ideally we can keep > optimisations within the database, rather than adding a layer of > caching on top. If you need to go the materialized view route, I'd definitely recommend doing that in the database rather than outside it. Thanks! Stephen
Attachment:
signature.asc
Description: Digital signature