Re: This bug hunt just gets weirder...

Gordan Bobic <gordan@xxxxxxxxxx> · Tue, 17 Feb 2009 19:39:41 +0000

OK, I've managed to resolve this, but it wasn't possible to resync the 
primary off the secondary. What I ended up doing was backing up the 
files that were changed since the primary went down, blanking the 
secondary, resyncing the secondary off the primary, and copying the 
backed up files back into the file system.

By primary and secondary here I am referring to the order in which they 
are listed in subvolumes.

So to re-iterate - syncing primary off the secondary wasn't working, but 
syncing secondary off the primary worked.

Can anyone hazard a guess as to how to debug this issue further? Since I 
have the backup of the old data on the secondary, I can probably have a 
go at re-creating the problem (I'm hoping it won't be re-creatable with 
the freshly synced data).

Gordan

Gordan Bobic wrote:
OK, now I'm completely stumped.

I just moved the backing store on the primary server away to a new 
directory and re-created the share's root directory, so it can resync 
from the secondary.

Only it doesn't. When the primary mounts the AFR volume, it reads the 
volume with "ls -laR" as empty. If I blind cd into a directory and ls 
it, the directory then gets created in the local store, and I can browse 
it.

The setup is CentOS 5.2 x86-64, glusterfs-2.0.0rc1, gluster patched fuse 
2.7.4.

Volume spec files are pasted here:

primary server:

-------------------------------------------

volume home3
        type protocol/client
        option transport-type socket
        option transport.address-family inet
        option remote-host 10.2.0.10
        option remote-port 6997
        option remote-subvolume home3
end-volume

volume home-store
        type storage/posix
        option directory /gluster/home
end-volume

volume home2
        type features/posix-locks
        subvolumes home-store
end-volume

volume server
        type protocol/server
        option transport-type socket
        option transport.address-family inet
        option transport.socket.listen-port 6997
        subvolumes home2
        option auth.addr.home2.allow 127.0.0.1,10.2.*
end-volume

volume home
        type cluster/afr
        subvolumes home2 home3
        option read-subvolume home2
end-volume

---------------------------------------------

secondary server:

volume home2
        type protocol/client
        option transport-type socket
        option transport.address-family inet
        option remote-host 10.2.3.1
        option remote-port 6997
        option remote-subvolume home2
end-volume

volume home-store
        type storage/posix
        option directory /gluster/home
end-volume

volume home3
        type features/posix-locks
        subvolumes home-store
end-volume

volume server
        type protocol/server
        option transport-type socket
        option transport.address-family inet
        option transport.socket.listen-port 6997
        subvolumes home3
        option auth.addr.home3.allow 127.0.0.1,10.2.*
end-volume

volume home
        type cluster/afr
        subvolumes home2 home3
        option read-subvolume home3
end-volume

----------------------------------------

The only other thing of note is that I'm passing the 
--disable-direct-io-mode parameter (I wanted tail -f to work properly).

No error appears in the log when ls-ing the share from the "empty" node.

Am I doing/overlooking something silly here due to a caffeine underflow 
error? :-/

Gordan

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel