On Thu, 12 Mar 2009, Scott Carey wrote:
Furthermore, if the problem was due to too much concurrency in the database with active connections, its hard to see how changing the lock code would change the result the way it did ?
What I wonder about is if the locking mechanism is accidentally turning into a CPU resource scheduling problem on this benchmark. If the connections were pooled instead, control over that scheduling would be more explicit, because connections would more directly map onto physical CPUs. What if the fall-off is because the sum of the working code set here is simply exceeding the sum of the CPU caching available once the number of active connections gets big enough? The real problem could be that the connections waiting on ProcArray are just falling out of cache, such that when they do wake up they take a while to page back in and keep going.
I wouldn't actually bet anything on that theory though, or any of the others offered here. I find wandering into performance bottleneck analysis presuming you know what's going on to be dangerous. The bigger issue here is that Jignesh is using a configuration known to be problematic (lots of connections), which introduces some uncertaintly about the true root cause here. Whether it's well founded or not, it still hurts his case.
And to step back for a second, after reading up on it again I see that Sun's internal iGen-OLTP benchmark "stresses lock management and connectivity"[1], which makes me wonder even more than I did before about how specific this fix is to this workload.
[1] http://blogs.sun.com/bmseer/entry/t2000_adds_database_leadership_to
First just run a test with a tiny delay (5ms? 0?) and fewer users to compare. If your theory that a connection pooler would help, that test would provide higher throughput with low user count and not be lock limited.
If the symptoms stay the same but are just scaled to a much lower connection count, that might help rule out some types of context switching and caching problem from the list of most likely suspects. Might as well make it 0ms to minimize the number of connections.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance