On Tue, Oct 21, 2008 at 5:54 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote: > I'm starting to see lock-ups when using a single-file client/server setup. > > machine1 (x86): ================================= > volume home2 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.3.1 > option remote-subvolume home2 > end-volume > > volume home-store > type storage/posix > option directory /gluster/home > end-volume > > volume home1 > type features/posix-locks > subvolumes home-store > end-volume > > volume server > type protocol/server > option transport-type tcp/server > subvolumes home1 > option auth.ip.home1.allow 127.0.0.1,192.168.* > end-volume > > volume home > type cluster/afr > subvolumes home1 home2 > option read-subvolume home1 > end-volume > > machine2 (x86-64): ================================= > volume home1 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.0.1 > option remote-subvolume home1 > end-volume > > volume home-store > type storage/posix > option directory /gluster/home > end-volume > > volume home2 > type features/posix-locks > subvolumes home-store > end-volume > > volume server > type protocol/server > option transport-type tcp/server > subvolumes home2 > option auth.ip.home2.allow 127.0.0.1,192.168.* > end-volume > > volume home > type cluster/afr > subvolumes home1 home2 > option read-subvolume home2 > end-volume > > ================== > > Do those configs look sane? > > When one machine is running on it's own, it's fine. Other client-only > machines can connect to it without any problems. However, as soon as the > second client/server comes up, typically the first ls access on the > directory will lock the whole thing up solid. > > Interestingly, on the x86 machine, the glusterfs process can always be > killed. Not so on the x86-64 machine (the 2nd machine that comes up). kill > -9 doesn't kill it. The only way to clear the lock-up is to reboot. > > Using the 1.3.12 release compiled into an RPM on both machines (CentOS 5.2). > > One thing worthy of note is that machine2 is nfsrooted / network booted. It > has local disks in it, and a local dmraid volume is mounted under /gluster > on it (machine1 has a disk-backed root). > > So, on machine1: > / is local disk > on machine2: > / is NFS > /gluster is local disk > /gluster/home is exported in the volume spec for AFR. > > If /gluster/home is newly created, it tends to get a little further, but > still locks up pretty quickly. If I try to execute find /home once it is > mounted, it will eventually hang, and the only thing of note I could see in > the logs is that it said "active lock found" at the point where it Do you see this error on server1 or server2? Any other clues in the logs? Krishna