Re: MusicBrainz postgres performance issues

Josh Krupka <jkrupka@xxxxxxxxx> · Sun, 15 Mar 2015 08:45:59 -0400

On Sun, Mar 15, 2015 at 8:07 AM, Robert Kaye <rob@xxxxxxxxxxxxxxx> wrote:what does free -m show on your db server?

             total       used       free     shared    buffers     cached
Mem:         48295      31673      16622          0          5      12670
-/+ buffers/cache:      18997      29298
Swap:        22852       2382      20470

Hmm that's definitely odd that it's swapping since it has plenty of free memory at the moment.  Is it still under heavy load right now?  Has the output of free consistently looked like that during your trouble times?

If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server?

0 

We adjusted that too, but no effect.

(I’ve updated the blog post with these two comments)

That had been updated a while ago or just now?

There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim.  I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have.  That would at least give you another angle to investigate.

If there are specific things you’d like to know, I’ve be happy to be a human proxy. :)

If zone reclaim is enabled (I think linux usually decides whether or not to enable it at boot time depending on the numa architecture) it sometimes avoids using memory on remote numa nodes if it thinks that memory access is too expensive.  This can lead to way too much disk access (not sure if it would actually make linux swap or not...) and lots of ram sitting around doing nothing instead of being used for fs cache like it should be.  Check to see if zone reclaim is enabled with this command: cat /proc/sys/vm/zone_reclaim_mode.  If your server is a numa one, you can install the numactl utility and look at the numa layout with this: numactl --hardware

I'm not sure how THP would cause lots of swapping, but it's worth checking in general: cat /sys/kernel/mm/transparent_hugepage/enabled.  If it's spending too much time trying to compact memory pages it can cause stalls in your processes.  To get the thp metrics do egrep 'trans|thp' /proc/vmstat