1.4.0RC6 AFR problems

amar at gluster.com (Amar Tumballi (bulde)) · Wed, 24 Dec 2008 15:05:28 -0800

Replies inline.

> 1) when a drive is failing and one of the AFR servers is dealing with
> IO errors, the other one freaks out and sometimes crashes, but
> doesn't seem to ever network timeout.
>

This was same issue as (4)

>
> 2) when starting gluster on the server with the new empty drive, it
> gave me a bunch of errors about things being out of sync and to
> delete a file from all but the preferred server.
> this struck me as odd, since the thing was empty.
> so I used the favorite child, but this isn't a preferred solution long
> term.
>

Sure, this should not happen.. Not yet fixed. Will be looking at it today.

>
> 3) one of the directories had 20GB of data in it.... I went to do an
> ls of the directory and had to wait while it auto-healed all the
> files..  while this is helpful, it would be nice to have gotten back
> the directory listing without having to wait for 20GB of data to get
> sent over the network.
>

Currently this behavior is not going to be changed (at least til 1.4.0),
because, this can happen only if it is self-healing. And it will make sure
things are ok when accessed first time. As it works fine now, we don't want
to do a code change upto making a stable release.

>
> 4) while the other server was down, the up server kept failing..
> signal 11?  and I had to constantly remount the filesystem.  It was
> giving me messages about the other node being down which was fine but
> then it'd just die after a while.. consistently.
>

This is fixed in tla, we have made a qa release to internal team, once
passes basic tests, will be making next 'RC' release.

Regards,
Amar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zresearch.com/pipermail/gluster-users/attachments/20081224/772501c2/attachment.htm