Dan, can you do 'echo 3 > /proc/sys/vm/drop_caches' and see if the usage comes back to normal? avati 2008/12/22 Dan Parsons <dparsons@xxxxxxxx>: > OK, I just had this problem again in a big way. > > root 26231 9.3 90.5 12676632 11141304 ? Ssl Dec17 659:31 [glusterfs] > > That's 90.5% of 12GB RAM. cache-size is set to 2048mb. Miraculously this > node is still running, about 28 of my 33 nodes died over the weekend because > of this issue. We wanted to run some big jobs over the holiday break but > this crash is getting in the way. > > Is there *anything* that can be done? > > Dan Parsons > > > On Dec 17, 2008, at 3:28 PM, Anand Avati wrote: > >> Dan, >> I have a vague memory about giving a custom patch for io-cache. Was that >> you? Can you mail me the diff and I can answer your question.. >> >> Avati >> >> On Dec 17, 2008 2:34 PM, "Dan Parsons" <dparsons@xxxxxxxx> wrote: >> >> I'd love to use 1.4rc4 but are there any issues in it that would effect >> me? >> I have 4 glusterfs servers, each with 2gbit ethernet (bonded), provididing >> sustained 8gbit/s to 33 client nodes. Below is my entire config file. If >> you >> see anything in there using a system that is either buggy or non-optimal >> in >> 1.4rc4, or would be difficult to upgrade, please let me know. If not, I >> can >> possibly upgrade. >> >> Below is my current config file. The one I was using when gluster was >> using >> all memory is identical except for 'cache-size' was changed to 4096MB and >> 'page-size' was changed to 512KB. >> >> ----------- >> ### Add client feature and attach to remote subvolume of server1 >> volume distfs01 >> type protocol/client >> option transport-type tcp/client # for TCP/IP transport >> option remote-host 10.8.101.51 # IP address of the remote brick >> option remote-subvolume brick # name of the remote volume >> end-volume >> >> ### Add client feature and attach to remote subvolume of server2 >> volume distfs02 >> type protocol/client >> option transport-type tcp/client # for TCP/IP transport >> option remote-host 10.8.101.52 # IP address of the remote brick >> option remote-subvolume brick # name of the remote volume >> end-volume >> >> volume distfs03 >> type protocol/client >> option transport-type tcp/client >> option remote-host 10.8.101.53 >> option remote-subvolume brick >> end-volume >> >> volume distfs04 >> type protocol/client >> option transport-type tcp/client >> option remote-host 10.8.101.54 >> option remote-subvolume brick >> end-volume >> >> volume stripe0 >> type cluster/stripe >> option block-size *.gff:1KB,*.nt:1KB,*.best:1KB,*.txt3:1KB,*.nbest.info:1 >> KB*:1MB >> option scheduler alu >> option alu.order read-usage:write-usage:disk-usage >> option alu.read-usage.entry-threshold 20% >> option alu.read-usage.exit-threshold 4% >> option alu.write-usage.entry-threshold 20% >> option alu.write-usage.exit-threshold 4% >> option alu.disk-usage.entry-threshold 2GB >> option alu.disk-usage.exit-threshold 100MB >> subvolumes distfs01 distfs02 distfs03 distfs04 >> end-volume >> >> volume ioc type performance/io-cache subvolumes stripe0 # In >> this >> example it is 'client... >> volume fixed >> type features/fixed-id >> option fixed-uid 0 >> option fixed-gid 900 >> subvolumes ioc >> end-volume >> >> Dan Parsons >> >> On Dec 17, 2008, at 2:09 PM, Anand Avati wrote: > Dan, > Is it feasible >> for >> you to try 1.4.0pre4... > >