On Tue, 25 Oct 2011 19:01:33 -0700 Harry Mangalam <harry.mangalam at uci.edu> wrote: > We're considering implementing gluster for a genomics cluster, and it > seems to have some theoretical advantages that so far seem to have > been borne out in some limited testing, mod some odd problems with an > inability to delete dir trees. I'm about to test with the latest > beta that was promised to clear up these bugs, but as I'm doing that, > answers to these Qs would be appraciated... > > - what happens in a distributed system if a node goes down? Does the > rest of the system keep working with the files on that brick > unavailable until it comes back or is the filesystem corrupted? In > my testing, it seemed that the system indeed kept working and added > files to the remaining systems, but that files that were hashed to > the failed volume were unavailable (of course). Yes, this is what I would expect (and have always observed) when using just distribution without replication. Not only are existing files on the failed brick unavailable, but IMX attempts to create new files which would hash to that brick (effectively a random 1/N) also fail. That part, at least, is fixable. With replication, the single-brick failure would effectively be invisible to the distribution layer so even this glitch wouldn't occur. > - is there a head node? the system is distributed but you're > mounting a specific node for the glusterfs mount - if that node goes > down, is the whole filesystem hosed or is that node reference really > a group reference and the gluster filesystem continues with the loss > of that node's files? ie can any gluster node replace a mountpoint > node and does that happen transparently? (I haven't tested this). The node that you specify for the mount is really only used to fetch the volfile, which contains the names of all bricks that are involved in providing service for that volume. The mount node might not even be one of those nodes itself (e.g. mount from gluster1, bricks are actually on gluster2 and gluster3). Once the connections have been made to each brick, they're all equal and the failure of one will have only partial (if any) effect. > - can you intermix distributed and mirrored volumes? This is of > particular interest since some of our users want to have replicated > data and some don't care. Every volume is inherently distributed (even if there's only one brick), and can optionally be striped and/or replicated as well independently of what's being done for other volumes.