I have three very similar functions, two of which are
fast and one is slow and I cannot explain the slowness of
the third. All of which, with the correct arguments should return
the same "optimal set". I present one of the two fast
versions and the slow one, in whole and in part. I'm
using postgres 9.6 on ubuntu 16.4 desktop (16G memory).
I'm confused because the bulk of the difference in the sql is in
the "with" clause and those are equally quick and return the same
dataset, and to my eye the processing after that is the same as
sql but clearly not in the server. I'm missing something obvious
I'm sure but whatever it is is causing a nasty Cartesian then
sorting that out.
(And I get that mm.markerset_id is not needed in the where clause)
Any insights appreciated.
Thanks,
rjs
I have not included table defs, thinking that since they're the
same for both queries ...
Fast function
create or replace function
seg.optimal_pvalue_set(people_name text, markers_name text, chr
int)
returns table (segment_id uuid, optval numeric, firstbase
int) as
$$
declare
mkset uuid;
rcount int;
begin
select id into mkset from seg.markerset where name =
markers_name and chrom = chr;
--
create temp table optmarkers on commit drop as
with segset as (
select s.id
, s.chrom
, s.markerset_id
, s.startbase
, s.endbase
, ((s.events_equal + s.events_greater)/(1.0
* (s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id =
i.id
join (select people_id, array_agg(person_id)
as persons
from seg.people_member
group by people_id) as pa on i.probands
<@ pa.persons
join seg.people o on pa.people_id = o.id
where
s.markerset_id = mkset
and o.name = people_name
)
select m.id as mkrid
, min(ss.pval) as optval
from segset ss
join seg.markerset_member mm on ss.markerset_id =
mm.markerset_id
join seg.marker m on mm.member_id = m.id
where
m.basepos between ss.startbase and ss.endbase
and m.chrom = ss.chrom
and mm.markerset_id = mkset --
'b474655c-80d2-47e7-bcb5-c65245195888'
group by m.id;
--
get diagnostics rcount = ROW_COUNT;
raise notice '% segments to optimize', rcount;
--
return query
select s.id as segment_id, o.optval, min(m.basepos) as
firstbase
from optmarkers o
join seg.marker m on o.mkrid = m.id
join seg.markerset_member mm on m.id =
mm.member_id
join seg.segment s on mm.markerset_id =
s.markerset_id
where mm.markerset_id = mkset
and m.basepos between s.startbase and s.endbase
and ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) = o.optval
group by s.id, o.optval
order by firstbase;
end;
$$ language plpgsql;
-- timing the CTE base
--with, the first
select count(s.id)
-- s.id
-- , s.chrom
-- , s.markerset_id
-- , s.startbase
-- , s.endbase
-- , ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join (select people_id, array_agg(person_id) as
persons
from seg.people_member
group by people_id) as pa on i.probands <@
pa.persons
join seg.people o on pa.people_id = o.id
where
s.markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea' --mkset
and o.name = '1151704'; -- people_name
count
-------
30762
(1 row)
--explain analyze of Full CTE portion (that which constructs the
temp table in prep for final query)
--manually supplying the needed ids and such.
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=15412.11..15413.24 rows=65 width=48)
(actual time=5462.143..6042.901 rows=52196 loops=1)
Group Key: m.id
CTE segset
-> Nested Loop (cost=4820.07..6024.99 rows=310
width=76) (actual time=11.854..449.869 rows=30762 loops=1)
-> Seq Scan on probandset i (cost=0.00..21.27
rows=1 width=16) (actual time=0.007..0.286 rows=57 loops=1)
Filter: ((probands <@
'{1bfa3037-bf65-483b-9ac5-f7a57f9e48a6,69f339e3-8ef4-440d-82ee-76627caf8e8b,7a2150d9-cd9c-454a-92d4-4b5cda85258d,7b42b108-8c17-4261-a4d1-3ed1f5da892d,b56886d3-981e-4872-847f-ce019fe70a87,c7f46c7d-0223-4b89-8093-771b5d6964d8}'::uuid[])
AND (people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid))
Rows Removed by Filter: 494
-> Bitmap Heap Scan on segment s
(cost=4820.07..5995.20 rows=310 width=84) (actual
time=7.438..7.493 rows=540 loops=57)
Recheck Cond: ((probandset_id = i.id) AND
(markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid))
Heap Blocks: exact=646
-> BitmapAnd (cost=4820.07..4820.07
rows=310 width=0) (actual time=7.434..7.434 rows=0 loops=57)
-> Bitmap Index Scan on useg
(cost=0.00..283.82 rows=5235 width=0) (actual time=0.357..0.357
rows=6470 loops=57)
Index Cond: (probandset_id = i.id)
-> Bitmap Index Scan on
segment_markerset_id_idx (cost=0.00..4493.24 rows=170774
width=0) (actual time=7.064..7.064 rows=150550 loops=57)
Index Cond: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Sort (cost=9387.11..9387.28 rows=65 width=48) (actual
time=5462.122..5818.158 rows=1780618 loops=1)
Sort Key: m.id
Sort Method: external merge Disk: 64168kB
-> Nested Loop (cost=105.76..9385.16 rows=65
width=48) (actual time=11.884..4648.098 rows=1780618 loops=1)
-> Nested Loop (cost=105.34..9013.49
rows=706 width=64) (actual time=11.875..944.708 rows=1796463
loops=1)
-> CTE Scan on segset ss
(cost=0.00..6.97 rows=2 width=60) (actual time=11.857..461.254
rows=30762 loops=1)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Bitmap Heap Scan on marker m
(cost=105.34..4471.17 rows=3209 width=24) (actual
time=0.007..0.011 rows=58 loops=30762)
Recheck Cond: ((chrom = ss.chrom) AND
(basepos >= ss.startbase) AND (basepos <= ss.endbase))
Heap Blocks: exact=54458
-> Bitmap Index Scan on
marker_chrom_basepos_idx (cost=0.00..104.54 rows=3209 width=0)
(actual time=0.005..0.005 rows=58 loops=30762)
Index Cond: ((chrom = ss.chrom)
AND (basepos >= ss.startbase) AND (basepos <= ss.endbase))
-> Index Scan using
markerset_member_member_id_idx on markerset_member mm
(cost=0.42..0.52 rows=1 width=32) (actual time=0.002..0.002
rows=1 loops=1796463)
Index Cond: (member_id = m.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 0
Planning time: 0.478 ms
Execution time: 6052.463 ms
(33 rows)
Time: 6053.983 ms
Slow Function
-- For an (potentially) arbitrary of grouping of
probandsets
create or replace function
seg.optimal_pvalue_set_group(pbgroup_id uuid, markers_name text,
people_name text, chr int)
returns table (segment_id uuid, optval numeric, firstbase int)
as
$$
declare
mkset uuid;
rcount int;
peop_id uuid;
begin
select id into mkset from seg.markerset where name =
markers_name and chrom = chr;
select id into peop_id from seg.people where name =
people_name;
raise notice 'marker set id is %; people id is % (%)', mkset,
peop_id, clock_timestamp();
--
create temp table optmarkers on commit drop as
with segset as (
select s.id
, s.chrom
, s.markerset_id
, s.startbase
, s.endbase
, ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join seg.probandset_group_member pgm on
s.probandset_id = pgm.member_id
where
s.markerset_id = mkset
and i.people_id = peop_id
and pgm.group_id = pbgroup_id
)
select m.id as mkrid
, min(ss.pval) as optval
from segset ss
join seg.markerset_member mm on ss.markerset_id =
mm.markerset_id
join seg.marker m on mm.member_id = m.id
where
m.basepos between ss.startbase and ss.endbase
and m.chrom = ss.chrom
and mm.markerset_id = mkset
group by m.id;
get diagnostics rcount = ROW_COUNT;
raise notice 'found % segments to optimize by probandset group
id % (%)', rcount, pbgroup_id, clock_timestamp();
--
return query
select s.id as segment_id, o.optval, min(m.basepos) as
firstbase
from optmarkers o
join seg.marker m on o.mkrid = m.id
join seg.markerset_member mm on m.id = mm.member_id
join seg.segment s on mm.markerset_id = s.markerset_id
where mm.markerset_id = mkset
and m.basepos between s.startbase and s.endbase
and ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) = o.optval
group by s.id, o.optval
order by firstbase;
end;
$$ language plpgsql;
--timing the CTE base
--with, the third
select count(s.id)
-- s.id
-- , s.chrom
-- , s.markerset_id
-- , s.startbase
-- , s.endbase
-- , ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join seg.probandset_group_member pgm on
s.probandset_id = pgm.member_id
where
s.markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea' -- mkset
and i.people_id =
'10c8d51d-c413-4943-90f7-9e0d2dd8c8c4' -- peop_id
and pgm.group_id =
'e7c01f89-8301-4221-9980-343b119711c5' -- pbgroup_id
;
count
-------
30392
(1 row)
Time: 146.692 ms
--explain analyze of Full CTE
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=39420.04..39425.18 rows=294 width=48)
(actual time=154302.816..154860.213 rows=52196 loops=1)
Group Key: m.id
CTE segset
-> Nested Loop (cost=282.86..23235.89 rows=1795
width=76) (actual time=0.568..82.964 rows=30392 loops=1)
-> Hash Join (cost=11.11..31.27 rows=6 width=32)
(actual time=0.035..0.147 rows=56 loops=1)
Hash Cond: (i.id = pgm.member_id)
-> Seq Scan on probandset i
(cost=0.00..19.89 rows=57 width=16) (actual time=0.005..0.092
rows=57 loops=1)
Filter: (people_id =
'10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid)
Rows Removed by Filter: 494
-> Hash (cost=10.41..10.41 rows=56
width=16) (actual time=0.027..0.027 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage:
11kB
-> Bitmap Heap Scan on
probandset_group_member pgm (cost=4.71..10.41 rows=56 width=16)
(actual time=0.013..0.021 rows=56 loops=1)
Recheck Cond: (group_id =
'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
Heap Blocks: exact=1
-> Bitmap Index Scan on
psgm_pkey (cost=0.00..4.70 rows=56 width=0) (actual
time=0.008..0.008 rows=56 loops=1)
Index Cond: (group_id =
'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
-> Bitmap Heap Scan on segment s
(cost=271.75..3859.10 rows=310 width=84) (actual
time=0.446..1.196 rows=543 loops=56)
Recheck Cond: (probandset_id = i.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 5974
Heap Blocks: exact=8223
-> Bitmap Index Scan on useg
(cost=0.00..271.68 rows=5235 width=0) (actual time=0.434..0.434
rows=6517 loops=56)
Index Cond: (probandset_id = i.id)
-> Sort (cost=16184.15..16184.88 rows=294 width=48)
(actual time=154302.795..154633.935 rows=1767743 loops=1)
Sort Key: m.id
Sort Method: external merge Disk: 63712kB
-> Nested Loop (cost=40.92..16172.09 rows=294
width=48) (actual time=96.084..153723.675 rows=1767743 loops=1)
-> Hash Join (cost=40.50..14499.61 rows=3177
width=64) (actual time=96.074..151048.722 rows=1783466 loops=1)
Hash Cond: (m.chrom = ss.chrom)
Join Filter: ((m.basepos >=
ss.startbase) AND (m.basepos <= ss.endbase))
Rows Removed by Join Filter: 1598476902
-> Seq Scan on marker m
(cost=0.00..11647.63 rows=635363 width=24) (actual
time=0.007..39.685 rows=635412 loops=1)
-> Hash (cost=40.39..40.39 rows=9
width=60) (actual time=96.027..96.027 rows=30392 loops=1)
Buckets: 32768 (originally 1024)
Batches: 1 (originally 1) Memory Usage: 2432kB
-> CTE Scan on segset ss
(cost=0.00..40.39 rows=9 width=60) (actual time=0.571..92.319
rows=30392 loops=1)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Index Scan using
markerset_member_member_id_idx on markerset_member mm
(cost=0.42..0.52 rows=1 width=32) (actual time=0.001..0.001
rows=1 loops=1783466)
Index Cond: (member_id = m.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 0
Planning time: 0.867 ms
Execution time: 154868.420 ms
(42 rows)
|