Search Postgresql Archives

Re: query not scaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/31/2017 03:12 AM, Laurenz Albe wrote:
Rob Sargent wrote:
I think your biggest problem is the join condition
     on m.basepos between s.startbase and s.endbase

That forces a nested loop join, which cannot be performed efficiently.
Agree! 800,000 * 4,000 = 3,200,000,000.  It's just that I thought I had
corralled that problem which indexing but apparently not.  I was hoping
some kind soul might point out a missing index or similar correction. I
have completely reworked the process, but not sure yet if it's correct.
(The slow answer is correct, once it comes in.)
You can create indexes that are useful for this query:

    ON sui.segment(chrom, markerset_id)
    ON sui.probandset(people_id)

But that probably won't make a big difference, because the sequential
scans take only a small fraction of your query time.

A little less than half of the query time is spent in the nested loop
join, and a little more than half of the time is spent doing the
GROUP BY.

Perhaps the biggest improvement you can easily make would be to
get rid of "numeric" for the computation.  I suspect that this is
where a lot of time is spent, since the hash aggregate is over
less than 15000 rows.

Unless you really need the precision of "numeric", try

CREATE OR REPLACE FUNCTION pv(l bigint, e bigint, g bigint, o int)
    RETURNS double precision LANGUAGE sql AS
$$SELECT (g+e+o)::double precision / (l+e+g+o)::double precision$$;

Yours,
Laurenz Albe
In practice markersets are always aligned with one chromosome so I would not expect this to have an effect. There's no constraint on this however, and there can be more than one markerset per chromosome. I have played with indexing on segment.markerset_id.

In all the data sets used in the examples (runtimes, explains etc) there has been a in single people_id across the existing segment data. Down the road this of course will not be the case and I can see the value of an index on probandset.people_id eventually. I can certainly add it now for a test. I'm currently writing a probandset loader hoping to get a test case for the problem with gin indexing mentioned up-thread.

I think I'm most surprise at the notion that the arithmetic is the problem and will happily test your suggestion to force floating point values. The value can get small (10^-12 on a good day!) but we don't need many digits of precision.

Thanks



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux