Re: Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

Merlin Moncure <mmoncure@xxxxxxxxx> · Mon, 6 Jul 2015 11:40:27 -0500

On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell <graeme.bell@xxxxxxxx> wrote:
> Hi everyone,
>
> I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]
>
> Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of parallelisation in the last 12 months.
>
> Basically:
>
> - I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance.
>
> - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement.
>
> - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
>
> - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independently defined functions, or accessing tables in a read-only way. They hit a limit of 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3) regardless of how many CPU cores I throw at them. This is about 6 times slower than I'm expecting.
>
> I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables for output.
>
> Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
>
> I'm wondering what I'm missing here. Any ideas?

I'm not necessarily seeing your results.   via pgbench,

mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 658833
latency average: 0.091 ms
tps = 10980.538470 (including connections establishing)
tps = 10980.994547 (excluding connections establishing)
mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 2847631
latency average: 0.084 ms
tps = 47460.430447 (including connections establishing)
tps = 47463.702074 (excluding connections establishing)

b.sql:
select f();

f():
create or replace function f() returns int as $$ begin return 1; end;
$$ language plpgsql;

the results are pretty volatile even with a 60s run, but I'm clearly
not capped at 2.5x parallelization (my box is 4 core).  It would help
if you disclosed the function body you're benchmarking.   If the
problem is indeed on the sever, the next step I think is to profile
the code and look for locking issues.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance