OK, I just had this problem again in a big way.
root 26231 9.3 90.5 12676632 11141304 ? Ssl Dec17 659:31
[glusterfs]
That's 90.5% of 12GB RAM. cache-size is set to 2048mb. Miraculously
this node is still running, about 28 of my 33 nodes died over the
weekend because of this issue. We wanted to run some big jobs over the
holiday break but this crash is getting in the way.
Is there *anything* that can be done?
Dan Parsons
On Dec 17, 2008, at 3:28 PM, Anand Avati wrote:
Dan,
I have a vague memory about giving a custom patch for io-cache. Was
that
you? Can you mail me the diff and I can answer your question..
Avati
On Dec 17, 2008 2:34 PM, "Dan Parsons" <dparsons@xxxxxxxx> wrote:
I'd love to use 1.4rc4 but are there any issues in it that would
effect me?
I have 4 glusterfs servers, each with 2gbit ethernet (bonded),
provididing
sustained 8gbit/s to 33 client nodes. Below is my entire config
file. If you
see anything in there using a system that is either buggy or non-
optimal in
1.4rc4, or would be difficult to upgrade, please let me know. If
not, I can
possibly upgrade.
Below is my current config file. The one I was using when gluster
was using
all memory is identical except for 'cache-size' was changed to
4096MB and
'page-size' was changed to 512KB.
-----------
### Add client feature and attach to remote subvolume of server1
volume distfs01
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.8.101.51 # IP address of the remote brick
option remote-subvolume brick # name of the remote volume
end-volume
### Add client feature and attach to remote subvolume of server2
volume distfs02
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.8.101.52 # IP address of the remote brick
option remote-subvolume brick # name of the remote volume
end-volume
volume distfs03
type protocol/client
option transport-type tcp/client
option remote-host 10.8.101.53
option remote-subvolume brick
end-volume
volume distfs04
type protocol/client
option transport-type tcp/client
option remote-host 10.8.101.54
option remote-subvolume brick
end-volume
volume stripe0
type cluster/stripe
option block-size *.gff:1KB,*.nt:1KB,*.best:
1KB,*.txt3:1KB,*.nbest.info:1
KB*:1MB
option scheduler alu
option alu.order read-usage:write-usage:disk-usage
option alu.read-usage.entry-threshold 20%
option alu.read-usage.exit-threshold 4%
option alu.write-usage.entry-threshold 20%
option alu.write-usage.exit-threshold 4%
option alu.disk-usage.entry-threshold 2GB
option alu.disk-usage.exit-threshold 100MB
subvolumes distfs01 distfs02 distfs03 distfs04
end-volume
volume ioc type performance/io-cache subvolumes stripe0 #
In this
example it is 'client...
volume fixed
type features/fixed-id
option fixed-uid 0
option fixed-gid 900
subvolumes ioc
end-volume
Dan Parsons
On Dec 17, 2008, at 2:09 PM, Anand Avati wrote: > Dan, > Is it
feasible for
you to try 1.4.0pre4...