Glyn Astill <glynastill@xxxxxxxxxxx> wrote: > The issue I'm seeing is that 8 real cores outperform 16 real > cores, which outperform 32 real cores under high concurrency. With every benchmark I've done of PostgreSQL, the "knee" in the performance graph comes right around ((2 * cores) + effective_spindle_count). With the database fully cached (as I believe you mentioned), effective_spindle_count is zero. If you don't use a connection pool to limit active transactions to the number from that formula, performance drops off. The more CPUs you have, the sharper the drop after the knee. I think it's nearly inevitable that PostgreSQL will eventually add some sort of admission policy or scheduler so that the user doesn't see this effect. With an admission policy, PostgreSQL would effectively throttle the startup of new transactions so that things remained almost flat after the knee. A well-designed scheduler might even be able to sneak marginal improvements past the current knee. As things currently stand it is up to you to do this with a carefully designed connection pool. > 32 cores is much faster than 8 when I have relatively few clients, > but as the number of clients is scaled up 8 cores wins outright. Right. If you were hitting disk heavily with random access, the sweet spot would increase by the number of spindles you were hitting. > I was hoping someone had seen this sort of behaviour before, and > could offer some sort of explanation or advice. When you have multiple resources, adding active processes increases overall throughput until roughly the point when you can keep them all busy. Once you hit that point, adding more processes to contend for the resources just adds overhead and blocking. HT is so bad because it tends to cause context switch storms, but context switching becomes an issue even without it. The other main issue is lock contention. Beyond a certain point, processes start to contend for lightweight locks, so you might context switch to a process only to find that it's still blocked and you have to switch again to try the next process, until you finally find one which can make progress. To acquire the lightweight lock you first need to acquire a spinlock, so as things get busier processes start eating lots of CPU in the spinlock loops trying to get to the point of being able to check the LW locks to see if they're available. You clearly got the best performance with all 32 cores and 16 to 32 processes active. I don't know why you were hitting the knee sooner than I've seen in my benchmarks, but the principle is the same. Use a connection pool which limits how many transactions are active, such that you don't exceed 32 processes busy at the same time, and make sure that it queues transaction requests beyond that so that a new transaction can be started promptly when you are at your limit and a transaction completes. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance