Oh, I've also tried earth distance and ll_to_earth in a GIST index... it was slower that the BTREE index on a small subset of data in my tests. On 19 April 2017 at 16:01, Samuel Williams <space.ship.traveller@xxxxxxxxx> wrote: > Hi. > > We have 400,000,000 records in a table (soon to be 800,000,000), here > is the schema (\d+) > > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 > > We want the following kinds of query to be fast: > > SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE > "user_event"."what" IN ('poll', 'location_change', > 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 > 01:23:55') AND (latitude > -37.03079375089291 AND latitude < > -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < > 175.0805140220076); > > We have a btree index and it appears to be working. However, it's > still pretty slow. > > EXPLAIN ANALYZE gives the following: > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 > > I'm thinking that I need to do the following to help: > > CLUSTER user_event ON index_user_event_for_visits_3 followed by > analyze... Our data is mostly time series but sometimes we get some > dumps with historical records. > > Perhaps add a BRIN index on created_at > > I'm wondering if... we can use an index to cache, all user_ids seen on > a given day. If we change our index to be more discrete, e.g. > created_at::date, would this help? The set union of user_ids for 365 > days should be pretty fast? > > I'm open to any ideas or suggestions, ideally we can keep > optimisations within the database, rather than adding a layer of > caching on top. > > Kind regards, > Samuel -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general