Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 2, 2015 at 9:15 AM, Graeme B. Bell <graeme.bell@xxxxxxxx> wrote:
Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem I've seen in other postgres projects involving parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement.

- Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independent or accessing tables in a read-only way. They hit a limit at 2.5x performance improvement relative to single-CPU performance (pg9.4) and 2x performance (pg9.3). This is about 6 times slower than I'm expecting.

- Can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables.

Curious? Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

Wondering what I'm missing here. Any ideas?

No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, "calculate the molecular weight of this molecule"). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service (exact same code, same network overhead), I could easily scale up and use the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of having the functions in SQL wasn't that important.

Craig
 

Graeme.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux