Re: This bug hunt just gets weirder...

nicolas prochazka <prochazka.nicolas@xxxxxxxxx> · Tue, 3 Mar 2009 15:26:08 +0100

It seems to be the same problem occur with me ( cf previous report )

----
Hello
I'm using last gluster from git.
I think there's problem with lock server in AFR mode :

Test :
Server A and B in AFR

TEST 1
1 / install A , B  then copie a file to A : synchro to B is perfect
2 / erase all B server and resinstall it   : synchronisation is not
possible. ( nothing is doing )

TEST 2
1 / install A , B  then copie a file to A (gluster mount point)  :
synchro to B is perfect
2 / erase all A : reinstall it :  synchro from B is perfect

Now if a redo TEST 1 , but  in my last volume (volume last) ,  I
inverse brick_10.98.98.1 and 10.98.98.2  in subvolumes, so now it is
10.98.98.1 as lock server for AFR
TEST 1 work  , TEST 2  not .

I think it try to use lock server where file does not exist in a case,
so problem occur.
I try to add 2 lock lock server with
option data-lock-server-count 2
option entry-lock-server-count 2

without success,
i'm trying with 0  , without success.

Client config file ( the same for A and B )

volume brick_10.98.98.1
type protocol/client
option transport-type tcp/client
option transport-timeout 120
option remote-host 10.98.98.1
option remote-subvolume brick
end-volume

volume brick_10.98.98.2
type protocol/client
option transport-type tcp/client
option transport-timeout 120
option remote-host 10.98.98.2
option remote-subvolume brick
end-volume

volume last
type cluster/replicate
subvolumes brick_10.98.98.2 brick_10.98.98.1
option read-subvolume brick_10.98.98.2
option favorite-child brick_10.98.98.2
end-volume

volume iothreads
type performance/io-threads
option thread-count 4
subvolumes last
end-volume

volume io-cache
type performance/io-cache
option cache-size 2048MB             # default is 32MB
option page-size  1MB             #128KB is default option
option cache-timeout 2  # default is 1
subvolumes iothreads
end-volume

volume writebehind
type performance/write-behind
option block-size 256KB # default is 0bytes
option cache-size 512KB
option flush-behind on      # default is 'off'
subvolumes io-cache
end-volume

Server config for A and B  the same execpt for IP

volume brickless
type storage/posix
option directory /mnt/disks/export
end-volume

volume brickthread
type features/posix-locks
option mandatory on          # enables mandatory locking on all files
subvolumes brickless
end-volume

volume brickcache
type performance/io-cache
option cache-size 1024MB
option page-size 1MB
option cache-timeout 2
subvolumes brickthread
end-volume

volume brick
type performance/io-threads
option thread-count 8
option cache-size 256MB
subvolumes brickcache
end-volume

volume server
type protocol/server
subvolumes brick
option transport-type tcp
option auth.addr.brick.allow 10.98.98.*
end-volume

On Tue, Mar 3, 2009 at 2:40 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
> On Tue, 3 Mar 2009 19:02:03 +0530, Anand Avati <avati@xxxxxxxxxxx> wrote:
>> On Wed, Feb 18, 2009 at 1:09 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
>>> OK, I've managed to resolve this, but it wasn't possible to resync the
>>> primary off the secondary. What I ended up doing was backing up the
> files
>>> that were changed since the primary went down, blanking the secondary,
>>> resyncing the secondary off the primary, and copying the backed up files
>>> back into the file system.
>>>
>>> By primary and secondary here I am referring to the order in which they
>>> are listed in subvolumes.
>>>
>>> So to re-iterate - syncing primary off the secondary wasn't working, but
>>> syncing secondary off the primary worked.
>>>
>>> Can anyone hazard a guess as to how to debug this issue further? Since I
>>> have the backup of the old data on the secondary, I can probably have a
>>> go
>>> at re-creating the problem (I'm hoping it won't be re-creatable with the
>>> freshly synced data).
>>
>> Did you happen to change the subvolume order in the vofile, or
>> add/remove subvols? Doing so may result in such unexpected behavior.
>
> Yes. I added a server/subvolume to the AFR cluster, and subequently removed
> one of the servers. Are there any additional procedures that have be
> followed when adding a node to a cluster?
>
> Gordan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>