Postgres sometimes stalling on 'percentile_cont'

Tom van Tilburg <tom.van.tilburg@xxxxxxxxx> · Wed, 4 Jul 2018 15:34:42 +0200

Hi,

I have a set of relatively complex queries producing tables (including postgis and pgpointcloud functions) that I run consecutively, and sometimes (depending on the base-data) my 5th query stalls (CPU 100%, runs forever) seemingly on the percentile_cont function. When I replace percentile_cont with just a value it passes.

The setup roughly looks like this:
query1 - creates 77 records with polygons
query2 - creates 89 records with polygons
query3 - creates ~350k records with points (inside above polygons)
query4 - creates ~220k records with points clustered by height (from result query3)
query5 - creates ~102k records with point clustered by normal (from result query3)

The odd thing is, when I run query5 directly after query4, it will stall on some datasets (always same sets). Though when I cancel the query and run it again, it will pass in about 2 seconds. 
When I run  query5 from a different shell directly after query4, it also passes.

Does anyone have a clue on how to research this further? How can I look into the process itself to see what is cycling the CPU? How to break this problem down to smaller chunks? I am a bit out of options myself.

For what it's worth:

SELECT version();
"PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406, 64-bit"

SELECT postgis_full_version();
"POSTGIS="2.5.0beta1dev r16609" [EXTENSION] PGSQL="100" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" SFCGAL="1.3.1" PROJ="Rel. 5.1.0, June 1st, 2018" GDAL="GDAL 2.3.1, released 2018/06/22" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" RASTER"

The part of query5 where I replace percentile_cont with just a value:
SELECT 
    c1.id,c1.cid, c1.blockid, 
    CASE 
        WHEN aspectmedian > 360 THEN aspectmedian - 360
        ELSE aspectmedian 
    END as aspect,
    pt geom
FROM clustered c1
JOIN (
    SELECT cid, blockid, aspectclass,
        --find median aspect and median height within bucket 
        --percentile_cont(0.5) within group (order by aspect) as aspectmedian
        0 as aspectmedian --replaced by value 0
    FROM clustered
    GROUP BY blockid,cid, aspectclass
) c2 ON (c1.cid = c2.cid AND c1.blockid = c2.blockid AND c1.aspectclass = c2.aspectclass)
WHERE c1.cid Is Not Null;

Best,
 Tom