Mysterious Escalating Load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Every time we start our rendering applications on our gluster volumes, the load starts climbing. At first, we thought it was our application, but apparently our application is locked up (more like blocked waiting on something). Top shows no active processes (e.g. load should be next to 0). After killing the application, the load continues to climb until we terminate and restart the glusterfs process. Glusterfs itself is not busy at all. An strace shows it just on epoll_wait. Top shows no processes using any cpu, thus it seems like the problem is in the kernel.

load average: 14.99, 14.93, 14.20

Before we had this problem, we were getting consistent kernel panics. Applying http://www.nabble.com/-fuse-devel--Kernel-oops-in-fuse_send_readpages()-t1374092.html fixed those. We're stuck to using the 2.6.16 kernel on Amazon's EC2. Fuse is version 2.6.3. We've disabled all performance optimizations out of desperation to get something working.


Anything I can look for to track this down?

Thanks,

Erik Osterman


# Server config
volume brick0
 type storage/posix
 option directory /mnt/glusterfs/brick0
end-volume

volume server
 type protocol/server
 subvolumes brick0
 option transport-type tcp/server
 option bind-address 0.0.0.0
 option listen-port 6996
 option client-volume-filename /etc/glusterfs/client.vol
 option auth.ip.brick0.allow *
end-volume



# Client config

volume ip0
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.253.59.65
 option remote-port 6996
 option remote-subvolume brick0
end-volume

volume ip1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.253.58.240
 option remote-port 6996
 option remote-subvolume brick0
end-volume

volume ip2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.253.58.239
 option remote-port 6996
 option remote-subvolume brick0
end-volume

volume afr
 type cluster/afr
 subvolumes ip0 ip1 ip2
option replicate *:2 end-volume

volume ip
 type cluster/unify
 subvolumes afr
 option scheduler rr
 option rr.limits.min-free-disk 2GB
end-volume





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux