Search Postgresql Archives

query not scaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a query I cannot tame and I'm wondering if there's an alternative
to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
better?  I've  tried the "<@" orperator and that changes the  query plan
significantly but the execution cost/time is not improved.

Any suggestion or pointers much appreciated.

Environment: Using  a virtual  CentOS Linux  release 7.4.1708  (Core), 4
cores (2.3GHz),  8G RAM and  postgres 10.0(beta3) shared_buffers  = 1GB,
work_mem = 2GB

Domain: (TL/DR)  A "segment" is  defined by  a particular set  of people
(probandset.id)  plus a  subset of  markers (markerset.id,  startmarker,
endmarker).   I need  the minimum  p-value for  each marker  in the  set
across all segments matching the set and a specific set of poeple.  So a
given segment says  "I cover all the markers from  startbase to endbase"
and each marker has a specific base position (relative to a chromosome).
I'm  after the  smallest  p-value  for each  marker  across  the set  of
segments which include that marker (from the 'between' clause).

Context:  I have  the query  in a  function so  the ids  of the  all the
players are available to the following sql:

    select m.id as mkrid
           , min(pv(s.events_less, s.events_equal, s.events_greater, 0)) as optval                                                             
    from marker m join segment s on m.basepos between s.startbase and s.endbase
         and m.chrom = 1
         and s.chrom = 1
         and s.markerset_id = suppliedMarkersetId
    join probandset r on s.probandset_id = r.id
         and r.people_id =  suppliedPeopleId
    group by m.id

where the pv function is

    create or replace function pv(l bigint, e bigint, g bigint, o int) 
    returns numeric
    as 
    $$
    select 1.0*(g+e+o)/(l+e+g+o);
    $$
    language sql
    ;


I have the identical schema in  two databases (same pg instance) and the
tables definitions  involved are  below.  In one  schema there  are 1.7M
records  in segment  and in  the other  there is  40M rows.   The marker
tables are much more similar with 600K and 900K respectively.  The third
table, probandset, has 60 and 600 respectively. On average 0.8M and 1.8M
segments per markerset_id.

The explains: (fast  (12sec), then slow(hours)).  The  part which sticks
out  to  me  is  where  the "between"  gets  used.   (I'm  betting  that
probandset is  too small to  matter.)  The  slower explain plan  is very
similar to  what I  saw originally in  the now "fast"  data set  and the
current indexing stategy comes largely from that performance work.

It looks like I'm getting a Cartesian between the number of markers in a
set and the number of segments found: ten zeros at least.
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=291472.27..292040.58 rows=56831 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=3752.33..167295.52 rows=4515518 width=40)
         ->  Nested Loop  (cost=3751.90..17906.25 rows=715 width=32)
               ->  Seq Scan on probandset r  (cost=0.00..2.77 rows=4 width=16)
                     Filter: (people_id = '4e3b9829-43a8-4f84-9df6-f120dc5b1a7e'::uuid)
               ->  Bitmap Heap Scan on segment s  (cost=3751.90..4473.96 rows=191 width=48)
                     Recheck Cond: ((probandset_id = r.id) AND (chrom = 1) AND (markerset_id = '61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid))
                     ->  BitmapAnd  (cost=3751.90..3751.90 rows=191 width=0)
                           ->  Bitmap Index Scan on useg  (cost=0.00..72.61 rows=2418 width=0)
                                 Index Cond: ((probandset_id = r.id) AND (chrom = 1))
                           ->  Bitmap Index Scan on segment_markerset_id_idx  (cost=0.00..3676.23 rows=140240 width=0)
                                 Index Cond: (markerset_id = '61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid)
         ->  Index Scan using marker_chrom_basepos_idx on marker m  (cost=0.42..145.79 rows=6315 width=20)
               Index Cond: ((chrom = 1) AND (basepos >= s.startbase) AND (basepos <= s.endbase))
(15 rows)


                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=83131331.81..83132151.44 rows=81963 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=1907.38..70802659.35 rows=448315362 width=40)
         Join Filter: ((m.basepos >= s.startbase) AND (m.basepos <= s.endbase))
         ->  Bitmap Heap Scan on marker m  (cost=1883.64..11009.18 rows=81963 width=20)
               Recheck Cond: (chrom = 1)
               ->  Bitmap Index Scan on marker_chrom_basepos_idx  (cost=0.00..1863.15 rows=81963 width=0)
                     Index Cond: (chrom = 1)
         ->  Materialize  (cost=23.74..181468.38 rows=49228 width=32)
               ->  Hash Join  (cost=23.74..181222.24 rows=49228 width=32)
                     Hash Cond: (s.probandset_id = r.id)
                     ->  Index Scan using segment_markerset_id_idx on segment s  (cost=0.56..178022.70 rows=251881 width=48)
                           Index Cond: (markerset_id = 'afad387e-a8e7-4c3b-9adb-3f00e70a13b3'::uuid)
                           Filter: (chrom = 1)
                     ->  Hash  (cost=21.68..21.68 rows=120 width=16)
                           ->  Seq Scan on probandset r  (cost=0.00..21.68 rows=120 width=16)
                                 Filter: (people_id = 'b124acb2-e400-4e63-8010-5a5f25a78f1c'::uuid)
(17 rows)


             Table "base.marker"
     Column  |      Type      | Modifiers 
    ---------+----------------+-----------
     id      | uuid           | not null
     name    | text           | not null
     chrom   | integer        | not null
     basepos | integer        | not null
     alleles | character(1)[] | 
    Indexes:
        "marker_pkey" PRIMARY KEY, btree (id)
        "marker_name_key" UNIQUE CONSTRAINT, btree (name)
        "marker_basepos_idx" btree (basepos)
        "marker_chrom_basepos_idx" btree (chrom, basepos)
    Referenced by:
        TABLE "markerset_member" CONSTRAINT "markerset_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES marker(id)

                  Table "aut.segment"
         Column     |  Type   |     Modifiers      
    ----------------+---------+--------------------
     id             | uuid    | not null
     chrom          | integer | not null
     markerset_id   | uuid    | not null
     probandset_id  | uuid    | not null
     startbase      | integer | not null
     endbase        | integer | not null
     firstmarker    | integer | not null
     lastmarker     | integer | not null
     events_less    | bigint  | not null default 0
     events_equal   | bigint  | not null default 0
     events_greater | bigint  | not null default 0
    Indexes:
        "segment_pkey" PRIMARY KEY, btree (id)
        "useg" UNIQUE CONSTRAINT, btree (probandset_id, chrom, startbase)
        "segment_markerset_id_chrom_firstmarker_idx" btree (markerset_id, chrom, firstmarker)
        "segment_markerset_id_idx" btree (markerset_id)
        "segment_startbase_idx" btree (startbase)
    Foreign-key constraints:
        "segment_markerset_id_fkey" FOREIGN KEY (markerset_id) REFERENCES markerset(id)
        "segment_probandset_id_fkey" FOREIGN KEY (probandset_id) REFERENCES probandset(id)
    Referenced by:
        TABLE "segmentset_member" CONSTRAINT "segmentset_member_segment_id_fkey" FOREIGN KEY (segment_id) REFERENCES segment(id)

               Table "aut.probandset"
       Column    |       Type       | Modifiers 
    -------------+------------------+-----------
     id          | uuid             | not null
     name        | text             | 
     probands    | uuid[]           | not null
     meioses     | integer          | 
     min_kincoef | double precision | 
     max_kincoef | double precision | 
     people_id   | uuid             | not null
    Indexes:
        "probandset_pkey" PRIMARY KEY, btree (id)
        "probandsetunique" gin (probands)
    Check constraints:
        "sortedset" CHECK (issorteduuids(probands))
    Foreign-key constraints:
        "probandset_people_id_fkey" FOREIGN KEY (people_id) REFERENCES people(id)
    Referenced by:
        TABLE "probandset_group_member" CONSTRAINT "probandset_group_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES probandset(id)
        TABLE "segment" CONSTRAINT "segment_probandset_id_fkey" FOREIGN KEY (probandset_id) REFERENCES probandset(id)

    explain select m.id as mkrid
           , min(pv(s.events_less, s.events_equal, s.events_greater, 0)) as optval                                                             
    from marker m join segment s on m.basepos between s.startbase and s.endbase
         and m.chrom = 1
         and s.chrom = 1
         and s.markerset_id = 'afad387e-a8e7-4c3b-9adb-3f00e70a13b3' 
    join probandset r on s.probandset_id = r.id
         and r.people_id =  'b124acb2-e400-4e63-8010-5a5f25a78f1c'
    group by m.id



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux