I'm experiencing a glusterfs client crash, signal 11, under the io-
cache xlator. This is on our bioinformatics cluster- the crash
happened on 2 out of 33 machines. I've verified the hardware stability
of the machines.
Running v1.3.8 built May 5th, 2008 from latest downloadable version.
Here is the crash message:
[0xffffe420]
/usr/local/lib/glusterfs/1.3.8/xlator/performance/io-
cache.so(ioc_page_wakeup+0x67)[0xb76c5f67]
/usr/local/lib/glusterfs/1.3.8/xlator/performance/io-
cache.so(ioc_inode_wakeup+0xb2)[0xb76c6902]
/usr/local/lib/glusterfs/1.3.8/xlator/performance/io-
cache.so(ioc_cache_validate_cbk+0xae)[0xb76c1e5e]
/usr/local/lib/glusterfs/1.3.8/xlator/cluster/
stripe.so(stripe_stack_unwind_buf_cbk+0x98)[0xb76cd038]
/usr/local/lib/glusterfs/1.3.8/xlator/protocol/
client.so(client_fstat_cbk+0xcc)[0xb76dd13c]
/usr/local/lib/glusterfs/1.3.8/xlator/protocol/client.so(notify+0xa97)
[0xb76db117]
/usr/local/lib/libglusterfs.so.0(transport_notify+0x38)[0xb7efe978]
/usr/local/lib/libglusterfs.so.0(sys_epoll_iteration+0xd6)[0xb7eff906]
/usr/local/lib/libglusterfs.so.0(poll_iteration+0x98)[0xb7efeb28]
[glusterfs](main+0x85e)[0x804a14e]
/lib/libc.so.6(__libc_start_main+0xdc)[0x7b1dec]
[glusterfs][0x8049391]
And here is my config file. The only thing I can think of is maybe my
cache-size is too big. I want a lot of cache, we have big files, and
the boxes have the RAM. Anyway, below is the config. If you see any
problems with it, please let me know. There are no errors on the
glusterfsd servers, except for an EOF from the machines where
glusterfs client segfaulted.
volume fuse
type mount/fuse
option direct-io-mode 1
option entry-timeout 1
option attr-timeout 1
option mount-point /glusterfs
subvolumes ioc
end-volume
volume ioc
type performance/io-cache
option priority *.psiblast:3,*.seq:2,*:1
option force-revalidate-timeout 5
option cache-size 1200MB
option page-size 128KB
subvolumes stripe0
end-volume
volume stripe0
type cluster/stripe
option alu.disk-usage.exit-threshold 100MB
option alu.disk-usage.entry-threshold 2GB
option alu.write-usage.exit-threshold 4%
option alu.write-usage.entry-threshold 20%
option alu.read-usage.exit-threshold 4%
option alu.read-usage.entry-threshold 20%
option alu.order read-usage:write-usage:disk-usage
option scheduler alu
option block-size *:1MB
subvolumes distfs01 distfs02 distfs03 distfs04
end-volume
volume distfs04
type protocol/client
option remote-subvolume brick
option remote-host 10.8.101.54
option transport-type tcp/client
end-volume
volume distfs03
type protocol/client
option remote-subvolume brick
option remote-host 10.8.101.53
option transport-type tcp/client
end-volume
volume distfs02
type protocol/client
option remote-subvolume brick
option remote-host 10.8.101.52
option transport-type tcp/client
end-volume
volume distfs01
type protocol/client
option remote-subvolume brick
option remote-host 10.8.101.51
option transport-type tcp/client
end-volume
Dan Parsons