On 12/22/05, John Sidney-Woollett <johnsw@xxxxxxxxxxxxx> wrote: > In trying to investigate a possible memory issue that affects only one > of our servers, I have been logging the process list for postgres > related items 4 times a day for the past few days. > > This server uses postgres 7.4.6 + slon 1.1.0 on Debian i686 (Linux > server2 2.6.8.1-4-686-smp) and is a slon slave in a two server > replicated cluster. Our master DB (similar setup) does not exbibit this > problem at all - only the subscriber node... > > The load average starts to go mental once the machine has to start > swapping (ie starts running out of physical RAM). The solution so far is > to stop and restart both slon and postgres and things return to normal > for another 2 weeks. > > I know other people have reported similar things but there doesn't seem > to be an explanation or solution (other than stopping and starting the > two processes). > > Can anyone suggest what else to look at on the server to see what might > be going on? > > Appreciate any help or advice anyone can offer. I'm not a C programmer > nor a unix sysadmin, so any advice needs to be simple to understand. > The memory usage growth is caused by the buffers in the slave slon daemon growing when long rows go through them. The buffers never shrink while the slon daemon is running. How big is the largest rows which slon replicates? One suggestion I have seen is to recompile slon to use fewer buffers. Another is to set a ulimit for memory size to automatically kill the slon daemons when they get too big. The watchdog will then restart them. Alternatively, your strategy of restarting the slon daemons each week will work (you don't need to restart postgres). I came up with a patch which shrinks the buffers when they go above a certain size. This doesn't fix the problem of lots of big rows happening at once but it fixes the gradual growth. - Ian