Hi Keith. Sorry for the previous email, it was a bit not in-place. Would you mind sharing how you recovered from this issue? I'm going to stress test a solution based on GlusterFS next week, including pulling live disk offline in middle of work, and would appreciate any hints you might share regarding recovering from the failures. Regards. 2008/12/23 Keith Freedman <freedman at freeformit.com> so, I had a drive failure on one of my boxes and it lead to discovery > of numerous issues today: > > 1) when a drive is failing and one of the AFR servers is dealing with > IO errors, the other one freaks out and sometimes crashes, but > doesn't seem to ever network timeout. > > 2) when starting gluster on the server with the new empty drive, it > gave me a bunch of errors about things being out of sync and to > delete a file from all but the preferred server. > this struck me as odd, since the thing was empty. > so I used the favorite child, but this isn't a preferred solution long > term. > > 3) one of the directories had 20GB of data in it.... I went to do an > ls of the directory and had to wait while it auto-healed all the > files.. while this is helpful, it would be nice to have gotten back > the directory listing without having to wait for 20GB of data to get > sent over the network. > > 4) while the other server was down, the up server kept failing.. > signal 11? and I had to constantly remount the filesystem. It was > giving me messages about the other node being down which was fine but > then it'd just die after a while.. consistently. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zresearch.com/pipermail/gluster-users/attachments/20081225/90d7137a/attachment.htm