Increasing the shared_buffers size improved the performance by 15%. The trend remains the same though: steep drop in performance after a certain number of clients.
My deployment is "NUMA-aware". I allocate cores that reside on the same socket. Once I reach the maximum number of cores, I start allocating cores from a neighbouring socket.
I'll try to print the number of spins_per_delay for each experiment... just in case I get something interesting.
On Fri, May 23, 2014 at 7:57 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas <dkarampin@xxxxxxxxx> wrote:
I want to bypass any disk bottleneck so I store all the data in ramfs (the purpose the project is to profile pg so I don't care for data loss if anything goes wrong).Since my data are memory resident, I thought the size of the shared buffers wouldn't play much role, yet I have to admit that I saw difference in performance when modifying shared_buffers parameter.In which direction? If making shared_buffers larger improves things, that suggests that you have contention on the BufFreelistLock. Increasing shared_buffers reduces buffer churn (assuming you increase it by enough) and so decreases that contention.I use taskset to control the number of cores that PostgreSQL is deployed on.It can be important what bits you set. For example if you have 4 sockets, each one with a quadcore, you would probably maximize the consequences of spinlock contention by putting one process on each socket, rather than putting them all on the same socket.Is there any parameter/variable in the system that is set dynamically and depends on the number of cores ?The number of spins a spinlock goes through before sleeping, spins_per_delay, is determined dynamically based on how often a tight loop "pays off". But I don't think this is very sensitive to the exact number of processors, just the difference between 1 and more than 1.