1.4.0RC6 AFR problems

stas.oskin at gmail.com (Stas Oskin) · Thu, 25 Dec 2008 00:45:44 +0200

Hi Keith.

Sorry for the previous email, it was a bit not in-place.

Would you mind sharing how you recovered from this issue?

I'm going to stress test a solution based on GlusterFS next week, including
pulling live disk offline in middle of work, and would appreciate any hints
you might share regarding recovering from the failures.

Regards.

2008/12/23 Keith Freedman <freedman at freeformit.com>

so, I had a drive failure on one of my boxes and it lead to discovery
> of numerous issues today:
>
> 1) when a drive is failing and one of the AFR servers is dealing with
> IO errors, the other one freaks out and sometimes crashes, but
> doesn't seem to ever network timeout.
>
> 2) when starting gluster on the server with the new empty drive, it
> gave me a bunch of errors about things being out of sync and to
> delete a file from all but the preferred server.
> this struck me as odd, since the thing was empty.
> so I used the favorite child, but this isn't a preferred solution long
> term.
>
> 3) one of the directories had 20GB of data in it.... I went to do an
> ls of the directory and had to wait while it auto-healed all the
> files..  while this is helpful, it would be nice to have gotten back
> the directory listing without having to wait for 20GB of data to get
> sent over the network.
>
> 4) while the other server was down, the up server kept failing..
> signal 11?  and I had to constantly remount the filesystem.  It was
> giving me messages about the other node being down which was fine but
> then it'd just die after a while.. consistently.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zresearch.com/pipermail/gluster-users/attachments/20081225/90d7137a/attachment.htm