Re: query not scaling

Rob Sargent <robjsargent@xxxxxxxxx> · Thu, 26 Oct 2017 19:07:25 -0600



    On 10/26/2017 09:01 AM, Tom Lane wrote:

    
      Laurenz Albe <laurenz.albe@xxxxxxxxxxx> writes:

      
        Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

      
      Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained _expression_ is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

			regards, tom lane

    
    In another instance of the same schema, in same database as
      original

      slow execution I've loaded 823591 segments (though in this case
      all

      of them are on one chromosome, one markerset) and 65K
      proband sets

      using same marker table as the slow(est) query.  In the fastest
      run,

      there are only 46K segments for the given markerset.
    

      QUERY
PLAN                                                                                   
      

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       HashAggregate  (cost=82122076.59..82122225.35 rows=14876
      width=48) (actual time=208203.091..208210.348 rows=14645 loops=1)

         Output: m.id, min(((1.0 * (((s.events_greater + s.events_equal)
      + 0))::numeric) / ((((s.events_less + s.events_equal) +
      s.events_greater) + 0))::numeric))

         Group Key: m.id

         Buffers: shared hit=43209090

         ->  Nested Loop  (cost=3799.40..44686205.23 rows=1361304413
      width=40) (actual time=55.443..89684.451 rows=75577302 loops=1)

               Output: m.id, s.events_greater, s.events_equal,
      s.events_less

               Buffers: shared hit=43209090

               ->  Hash Join  (cost=3798.98..43611.56 rows=823591
      width=32) (actual time=55.393..1397.509 rows=823591 loops=1)

                     Output: s.events_greater, s.events_equal,
      s.events_less, s.startbase, s.endbase

                     Inner Unique: true

                     Hash Cond: (s.probandset_id = p.id)

                     Buffers: shared hit=19222

                     ->  Seq Scan on sui.segment s 
      (cost=0.00..29414.86 rows=823591 width=48) (actual
      time=0.017..669.915 rows=823591 loops=1)

                           Output: s.id, s.chrom, s.markerset_id,
      s.probandset_id, s.startbase, s.endbase, s.firstmarker,
      s.lastmarker, s.events_less, s.events_equal, s.events_greater

                           Filter: ((s.chrom = 22) AND (s.markerset_id =
      'edf95066-24d2-4ca1-bad6-aa850cc82fef'::uuid))

                           Buffers: shared hit=17061

                     ->  Hash  (cost=2979.99..2979.99 rows=65519
      width=16) (actual time=55.272..55.272 rows=65519 loops=1)

                           Output: p.id

                           Buckets: 65536  Batches: 1  Memory Usage:
      3584kB

                           Buffers: shared hit=2161

                           ->  Seq Scan on sui.probandset p 
      (cost=0.00..2979.99 rows=65519 width=16) (actual
      time=0.007..33.582 rows=65519 loops=1)

                                 Output: p.id

                                 Filter: (p.people_id =
      '9b2308b1-9659-4a2c-91ae-8f95cd0a90b3'::uuid)

                                 Buffers: shared hit=2161

               ->  Index Scan using marker_chrom_basepos_idx on
      base.marker m  (cost=0.42..37.67 rows=1653 width=20) (actual
      time=0.010..0.075 rows=92 loops=823591)

                     Output: m.id, m.name, m.chrom, m.basepos, m.alleles

                     Index Cond: ((m.chrom = 22) AND (m.basepos >=
      s.startbase) AND (m.basepos <= s.endbase))

                     Buffers: shared hit=43189868

       Planning time: 0.764 ms

       Execution time: 208214.816 ms

      (30 rows)