Re: Weird lock-ups

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Wed, 22 Oct 2008 17:16:27 +0530

On Tue, Oct 21, 2008 at 5:54 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
> I'm starting to see lock-ups when using a single-file client/server setup.
>
> machine1 (x86): =================================
> volume home2
>        type protocol/client
>        option transport-type tcp/client
>        option remote-host 192.168.3.1
>        option remote-subvolume home2
> end-volume
>
> volume home-store
>        type storage/posix
>        option directory /gluster/home
> end-volume
>
> volume home1
>        type features/posix-locks
>        subvolumes home-store
> end-volume
>
> volume server
>        type protocol/server
>        option transport-type tcp/server
>        subvolumes home1
>        option auth.ip.home1.allow 127.0.0.1,192.168.*
> end-volume
>
> volume home
>        type cluster/afr
>        subvolumes home1 home2
>        option read-subvolume home1
> end-volume
>
> machine2 (x86-64): =================================
> volume home1
>        type protocol/client
>        option transport-type tcp/client
>        option remote-host 192.168.0.1
>        option remote-subvolume home1
> end-volume
>
> volume home-store
>        type storage/posix
>        option directory /gluster/home
> end-volume
>
> volume home2
>        type features/posix-locks
>        subvolumes home-store
> end-volume
>
> volume server
>        type protocol/server
>        option transport-type tcp/server
>        subvolumes home2
>        option auth.ip.home2.allow 127.0.0.1,192.168.*
> end-volume
>
> volume home
>        type cluster/afr
>        subvolumes home1 home2
>        option read-subvolume home2
> end-volume
>
> ==================
>
> Do those configs look sane?
>
> When one machine is running on it's own, it's fine. Other client-only
> machines can connect to it without any problems. However, as soon as the
> second client/server comes up, typically the first ls access on the
> directory will lock the whole thing up solid.
>
> Interestingly, on the x86 machine, the glusterfs process can always be
> killed. Not so on the x86-64 machine (the 2nd machine that comes up). kill
> -9 doesn't kill it. The only way to clear the lock-up is to reboot.
>
> Using the 1.3.12 release compiled into an RPM on both machines (CentOS 5.2).
>
> One thing worthy of note is that machine2 is nfsrooted / network booted. It
> has local disks in it, and a local dmraid volume is mounted under /gluster
> on it (machine1 has a disk-backed root).
>
> So, on machine1:
> / is local disk
> on machine2:
> / is NFS
> /gluster is local disk
> /gluster/home is exported in the volume spec for AFR.
>
> If /gluster/home is newly created, it tends to get a little further, but
> still locks up pretty quickly. If I try to execute find /home once it is
> mounted, it will eventually hang, and the only thing of note I could see in
> the logs is that it said "active lock found" at the point where it

Do you see this error on server1 or server2? Any other clues in the logs?

Krishna