Re: Large data and slow queries

vinny <vinny@xxxxxxxxx> · Wed, 19 Apr 2017 09:31:41 +0200

On 2017-04-19 07:04, Samuel Williams wrote:
Thanks John. Yes, you are absolutely right, you want the index to be
bottom heavy so you can cull as much as possible at the top. I'm
familiar with that, once implementing a brute-force sudoku solver, it
has the same principle.

I've been working on this all afternoon. By reducing the longitude,
latitude columns to float4, in my test cases, I found about 50%
improvement in performance. It may also use less space. So part of the
problem was my choice of data type. We've computed that float4 has a
worst case precision of about 1.6m which we are okay with for
analytics data.

Another option we may consider is using a (signed) integer - e.g.
longitude = 180*(v/2^31) and latitude = 180*(v/2^31) as this has a
uniform error across all points, but it's a bit more cumbersome to
handle. Is there a rational datatype in postgres which works like
this?

On 19 April 2017 at 16:42, John R Pierce <pierce@xxxxxxxxxxxx> wrote:
On 4/18/2017 9:01 PM, Samuel Williams wrote:

We want the following kinds of query to be fast:

SELECT ... AND (latitude > -37.03079375089291 AND latitude <
-36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
175.0805140220076);

I wonder if GIST would work better if you use the native POINT type, 
and
compared it like

mypoint <@ BOX
'((174.6307139779924,-37.03079375089291),(175.0805140220076,-36.67086424910709
))'

with a gist index on mypoint...

but, it all hinges on which clauses in your query are most selective, 
thats
where you want an index.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Did that 50% performance gain come from just the datatype, or that fact 
that the index became smaller?

Given the number of records, my first thought was either partitioning or 
partial-indexes.
The fewer rows are in the index, the quicker it will be to check,
and it's not a lot of work to create separate indexes for lat/long 
ranges or dates.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general