Re: Scalability in postgres

david@xxxxxxx · Thu, 4 Jun 2009 18:04:07 -0700 (PDT)

On Thu, 4 Jun 2009, Mark Mielke wrote:

Kevin Grittner wrote:
James Mansion <james@xxxxxxxxxxxxxxxxxxxxxx> wrote: 

I know that if you do use a large number of threads, you have to be
pretty adaptive.  In our Java app that pulls data from 72 sources and
replicates it to eight, plus feeding it to filters which determine
what publishers for interfaces might be interested, the Sun JVM does
very poorly, but the IBM JVM handles it nicely.  It seems they use
very different techniques for the monitors on objects which
synchronize the activity of the threads, and the IBM technique does
well when no one monitor is dealing with a very large number of
blocking threads.  They got complaints from people who had thousands
of threads blocking on one monitor, so they now keep a count and
switch techniques for an individual monitor if the count gets too
high.

Could be, and if so then Sun JVM should really address the problem. However, 
having thousands of threads waiting on one monitor probably isn't a scalable 
solution, regardless of whether the JVM is able to optimize around your usage 
pattern or not. Why have thousands of threads waiting on one monitor? That's 
a bit insane. :-)

You should really only have as 1X or 2X many threads as there are CPUs 
waiting on one monitor. Beyond that is waste. The idle threads can be pooled 
away, and only activated (with individual monitors which can be far more 
easily and effectively optimized) when the other threads become busy.

sometimes the decrease in complexity in the client makes it worthwhile to 
'brute force' things.

this actually works well for the vast majority of services (including many 
databases)

the question is how much complexity (if any) it adds to postgres to handle 
this condition better, and what those changes are.

Perhaps something like that (or some other new approach) might
mitigate the effects of tens of thousands of processes competing for
for a few resources, but it fundamentally seems unwise to turn those
loose to compete if requests can be queued in some way.

An alternative approach might be: 1) Idle processes not currently running a 
transaction do not need to be consulted for their snapshot (and other related 
expenses) - if they are idle for a period of time, they "unregister" from the 
actively used processes list - if they become active again, they "register" 
in the actively used process list,

how expensive is this register/unregister process? if it's cheap enough do 
it all the time and avoid the complexity of having another config option 
to tweak.

and 2) Processes could be reusable across 
different connections - they could stick around for a period after 
disconnect, and make themselves available again to serve the next connection.

depending on what criteria you have for the re-use, this could be a 
significant win (if you manage to re-use the per process cache much. but 
this is far more complex.

Still heavy-weight in terms of memory utilization, but cheap in terms of 
other impacts. Without the cost of connection "pooling" in the sense of 
requests always being indirect through a proxy of some sort.

it would seem to me that the cost of making the extra hop through the 
external pooler would be significantly more than the overhead of idle 
processes marking themselvs as such so that they don't get consulted for 
MVCC decisions

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance