Re: Postgres refusing to use >1 core

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 10 May 2011 12:01:47 +0800

On 10/05/11 10:40, Aren Cambre wrote:
>     how are you reading through the table? if you are using OFFSET, you
>     owe me a steak dinner.
> 
> 
> Nope. :-)
> 
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.

Looking at that code, I can't help but wonder why you're not doing it
server side in batches. In general, it's really inefficient to use this
pattern:

rows = runquery("select * from something");
for (item in rows) {
  // do something with item
}

Adding implicit parallelism within the loop won't help you much if
client-side CPU use isn't your limitation. If each computation done on
"item" is very expensive in client-side CPU this pattern makes sense,
but otherwise should be avoided in favour of grabbing big chunks of rows
and processing them all at once in batch SQL statements that let the
database plan I/O efficiently.

Even if you're going to rely on client-side looping - say, because of
complex or library-based computation that must be done for each record -
you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's
a new connection established manually or one grabbed from an appropriate
pool. Your code below shows no evidence of that at all; you're probably
sharing one connection between all the threads, achieving no real
parallelism whatsoever.

Try limiting your parallel invocation to 4 threads (since that's number
of cores you have) and making sure each has its own connection. In your
case, that probably means having a new Geocoder instance grab a
connection from a pool that contains at least 5 connections (one per
Geocoder, plus the main connection).

It also looks - though I don't know C# and npgsql so I can't be sure -
like you're passing some kind of query result object to the Geocoder.
Avoid that, because they might be using the connection to progressively
read data behind the scenes in which case you might land up having
locking issues, accidentally serializing your parallel work on the
single main connection, etc. Instead, retrieve the contents of the
IDataRecord (whatever that is) and pass that to the new Geocoder
instance, so the new Geocoder has *absolutely* *no* *link* to the
arrestsConnection and cannot possibly depend on it accidentally.

Even better, use a server-side work queue implementation like pgq, and
have each worker use its private connection to ask the server for the
next record to process when it's done with the previous one, so you
don't need a co-ordinating queue thread in your client side at all. You
can also optionally make your client workers independent processes
rather than threads that way, which simplifies debugging and resource
management.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance