It would appear the latest tla is stable in this configuration. I was
running a version updated on monday, so between then and now, something
good was fixed :)
Erik Osterman wrote:
Every time we start our rendering applications on our gluster volumes,
the load starts climbing. At first, we thought it was our application,
but apparently our application is locked up (more like blocked waiting
on something). Top shows no active processes (e.g. load should be
next to 0). After killing the application, the load continues to climb
until we terminate and restart the glusterfs process. Glusterfs itself
is not busy at all. An strace shows it just on epoll_wait. Top shows
no processes using any cpu, thus it seems like the problem is in the
kernel.
load average: 14.99, 14.93, 14.20
Before we had this problem, we were getting consistent kernel panics.
Applying
http://www.nabble.com/-fuse-devel--Kernel-oops-in-fuse_send_readpages()-t1374092.html
fixed those. We're stuck to using the 2.6.16 kernel on Amazon's
EC2. Fuse is version 2.6.3. We've disabled all performance
optimizations out of desperation to get something working.
Anything I can look for to track this down?
Thanks,
Erik Osterman
# Server config
volume brick0
type storage/posix
option directory /mnt/glusterfs/brick0
end-volume
volume server
type protocol/server
subvolumes brick0
option transport-type tcp/server
option bind-address 0.0.0.0
option listen-port 6996
option client-volume-filename /etc/glusterfs/client.vol
option auth.ip.brick0.allow *
end-volume
# Client config
volume ip0
type protocol/client
option transport-type tcp/client
option remote-host 10.253.59.65
option remote-port 6996
option remote-subvolume brick0
end-volume
volume ip1
type protocol/client
option transport-type tcp/client
option remote-host 10.253.58.240
option remote-port 6996
option remote-subvolume brick0
end-volume
volume ip2
type protocol/client
option transport-type tcp/client
option remote-host 10.253.58.239
option remote-port 6996
option remote-subvolume brick0
end-volume
volume afr
type cluster/afr
subvolumes ip0 ip1 ip2
option replicate *:2 end-volume
volume ip
type cluster/unify
subvolumes afr
option scheduler rr
option rr.limits.min-free-disk 2GB
end-volume
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel