Re: Scalability in postgres

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Jun 2009, Mark Mielke wrote:

Kevin Grittner wrote:
James Mansion <james@xxxxxxxxxxxxxxxxxxxxxx> wrote:

I know that if you do use a large number of threads, you have to be
pretty adaptive.  In our Java app that pulls data from 72 sources and
replicates it to eight, plus feeding it to filters which determine
what publishers for interfaces might be interested, the Sun JVM does
very poorly, but the IBM JVM handles it nicely.  It seems they use
very different techniques for the monitors on objects which
synchronize the activity of the threads, and the IBM technique does
well when no one monitor is dealing with a very large number of
blocking threads.  They got complaints from people who had thousands
of threads blocking on one monitor, so they now keep a count and
switch techniques for an individual monitor if the count gets too
high.

Could be, and if so then Sun JVM should really address the problem. However, having thousands of threads waiting on one monitor probably isn't a scalable solution, regardless of whether the JVM is able to optimize around your usage pattern or not. Why have thousands of threads waiting on one monitor? That's a bit insane. :-)

You should really only have as 1X or 2X many threads as there are CPUs waiting on one monitor. Beyond that is waste. The idle threads can be pooled away, and only activated (with individual monitors which can be far more easily and effectively optimized) when the other threads become busy.

sometimes the decrease in complexity in the client makes it worthwhile to 'brute force' things.

this actually works well for the vast majority of services (including many databases)

the question is how much complexity (if any) it adds to postgres to handle this condition better, and what those changes are.

Perhaps something like that (or some other new approach) might
mitigate the effects of tens of thousands of processes competing for
for a few resources, but it fundamentally seems unwise to turn those
loose to compete if requests can be queued in some way.


An alternative approach might be: 1) Idle processes not currently running a transaction do not need to be consulted for their snapshot (and other related expenses) - if they are idle for a period of time, they "unregister" from the actively used processes list - if they become active again, they "register" in the actively used process list,

how expensive is this register/unregister process? if it's cheap enough do it all the time and avoid the complexity of having another config option to tweak.

and 2) Processes could be reusable across different connections - they could stick around for a period after disconnect, and make themselves available again to serve the next connection.

depending on what criteria you have for the re-use, this could be a significant win (if you manage to re-use the per process cache much. but this is far more complex.

Still heavy-weight in terms of memory utilization, but cheap in terms of other impacts. Without the cost of connection "pooling" in the sense of requests always being indirect through a proxy of some sort.

it would seem to me that the cost of making the extra hop through the external pooler would be significantly more than the overhead of idle processes marking themselvs as such so that they don't get consulted for MVCC decisions

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux