OK, I did a little bit of testing with this new setting and have not
been able to reproduce the problem. However, this has happened in the
past, as in, I had this problem, made a setting change, problem seemed
fix, and then it came back. So it might not totally be fixed, but time
will tell.
Thanks again for your help, I'll write back with more info if it
happens again.
Dan Parsons
On Dec 22, 2008, at 8:47 AM, Anand Avati wrote:
Dan,
can you do 'echo 3 > /proc/sys/vm/drop_caches' and see if the usage
comes back to normal?
avati
2008/12/22 Dan Parsons <dparsons@xxxxxxxx>:
OK, I just had this problem again in a big way.
root 26231 9.3 90.5 12676632 11141304 ? Ssl Dec17 659:31
[glusterfs]
That's 90.5% of 12GB RAM. cache-size is set to 2048mb. Miraculously
this
node is still running, about 28 of my 33 nodes died over the
weekend because
of this issue. We wanted to run some big jobs over the holiday
break but
this crash is getting in the way.
Is there *anything* that can be done?
Dan Parsons
On Dec 17, 2008, at 3:28 PM, Anand Avati wrote:
Dan,
I have a vague memory about giving a custom patch for io-cache.
Was that
you? Can you mail me the diff and I can answer your question..
Avati
On Dec 17, 2008 2:34 PM, "Dan Parsons" <dparsons@xxxxxxxx> wrote:
I'd love to use 1.4rc4 but are there any issues in it that would
effect
me?
I have 4 glusterfs servers, each with 2gbit ethernet (bonded),
provididing
sustained 8gbit/s to 33 client nodes. Below is my entire config
file. If
you
see anything in there using a system that is either buggy or non-
optimal
in
1.4rc4, or would be difficult to upgrade, please let me know. If
not, I
can
possibly upgrade.
Below is my current config file. The one I was using when gluster
was
using
all memory is identical except for 'cache-size' was changed to
4096MB and
'page-size' was changed to 512KB.
-----------
### Add client feature and attach to remote subvolume of server1
volume distfs01
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.8.101.51 # IP address of the remote brick
option remote-subvolume brick # name of the remote volume
end-volume
### Add client feature and attach to remote subvolume of server2
volume distfs02
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.8.101.52 # IP address of the remote brick
option remote-subvolume brick # name of the remote volume
end-volume
volume distfs03
type protocol/client
option transport-type tcp/client
option remote-host 10.8.101.53
option remote-subvolume brick
end-volume
volume distfs04
type protocol/client
option transport-type tcp/client
option remote-host 10.8.101.54
option remote-subvolume brick
end-volume
volume stripe0
type cluster/stripe
option block-size *.gff:1KB,*.nt:1KB,*.best:
1KB,*.txt3:1KB,*.nbest.info:1
KB*:1MB
option scheduler alu
option alu.order read-usage:write-usage:disk-usage
option alu.read-usage.entry-threshold 20%
option alu.read-usage.exit-threshold 4%
option alu.write-usage.entry-threshold 20%
option alu.write-usage.exit-threshold 4%
option alu.disk-usage.entry-threshold 2GB
option alu.disk-usage.exit-threshold 100MB
subvolumes distfs01 distfs02 distfs03 distfs04
end-volume
volume ioc type performance/io-cache subvolumes stripe0
# In
this
example it is 'client...
volume fixed
type features/fixed-id
option fixed-uid 0
option fixed-gid 900
subvolumes ioc
end-volume
Dan Parsons
On Dec 17, 2008, at 2:09 PM, Anand Avati wrote: > Dan, > Is it
feasible
for
you to try 1.4.0pre4...