Re: Linux: more cores = less concurrency.

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Mon, 11 Apr 2011 16:06:53 -0500

Glyn Astill <glynastill@xxxxxxxxxxx> wrote:

> The issue I'm seeing is that 8 real cores outperform 16 real
> cores, which outperform 32 real cores under high concurrency.

With every benchmark I've done of PostgreSQL, the "knee" in the
performance graph comes right around ((2 * cores) +
effective_spindle_count).  With the database fully cached (as I
believe you mentioned), effective_spindle_count is zero.  If you
don't use a connection pool to limit active transactions to the
number from that formula, performance drops off.  The more CPUs you
have, the sharper the drop after the knee.

I think it's nearly inevitable that PostgreSQL will eventually add
some sort of admission policy or scheduler so that the user doesn't
see this effect.  With an admission policy, PostgreSQL would
effectively throttle the startup of new transactions so that things
remained almost flat after the knee.  A well-designed scheduler
might even be able to sneak marginal improvements past the current
knee.  As things currently stand it is up to you to do this with a
carefully designed connection pool.

> 32 cores is much faster than 8 when I have relatively few clients,
> but as the number of clients is scaled up 8 cores wins outright.

Right.  If you were hitting disk heavily with random access, the
sweet spot would increase by the number of spindles you were
hitting.

> I was hoping someone had seen this sort of behaviour before, and
> could offer some sort of explanation or advice.

When you have multiple resources, adding active processes increases
overall throughput until roughly the point when you can keep them
all busy.  Once you hit that point, adding more processes to contend
for the resources just adds overhead and blocking.  HT is so bad
because it tends to cause context switch storms, but context
switching becomes an issue even without it.  The other main issue is
lock contention.  Beyond a certain point, processes start to contend
for lightweight locks, so you might context switch to a process only
to find that it's still blocked and you have to switch again to try
the next process, until you finally find one which can make
progress.  To acquire the lightweight lock you first need to acquire
a spinlock, so as things get busier processes start eating lots of
CPU in the spinlock loops trying to get to the point of being able
to check the LW locks to see if they're available.

You clearly got the best performance with all 32 cores and 16 to 32
processes active.  I don't know why you were hitting the knee sooner
than I've seen in my benchmarks, but the principle is the same.  Use
a connection pool which limits how many transactions are active,
such that you don't exceed 32 processes busy at the same time, and
make sure that it queues transaction requests beyond that so that a
new transaction can be started promptly when you are at your limit
and a transaction completes.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance