Thanks very much for your input. I'm a bit surprised that new files would hash to the failed brick - there isn't a check to make sure that the assigned brick is responding and fall back to a ready brick? I can see that this would happen in the 1st few seconds of failure, but after a short timeout, shouldn't this feed back to the hasher? I'll explicitly test this when I bring up the new version today. Thanks again hjm On Wednesday 26 October 2011 06:34:33 Jeff Darcy wrote: > > - what happens in a distributed system if a node goes down? Does > > the rest of the system keep working with the files on that > > brick unavailable until it comes back or is the filesystem > > corrupted? In my testing, it seemed that the system indeed kept > > working and added files to the remaining systems, but that files > > that were hashed to the failed volume were unavailable (of > > course). > > Yes, this is what I would expect (and have always observed) when > using just distribution without replication. Not only are > existing files on the failed brick unavailable, but IMX attempts > to create new files which would hash to that brick (effectively a > random 1/N) also fail. That part, at least, is fixable. With > replication, the single-brick failure would effectively be > invisible to the distribution layer so even this glitch wouldn't > occur. -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- This signature has been OCCUPIED! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20111026/6446ed0e/attachment.htm>