On Wed, Feb 19, 2014 at 4:50 PM, Michael Peek <peek@xxxxxxxxxxx> wrote: > Thanks for the quick reply. > > On 02/19/2014 03:15 PM, James wrote: >> Short answer, it sounds like you'd benefit from playing with a test >> cluster... Would I be correct in guessing that you haven't setup a >> gluster pool yet? You might want to look at: >> https://ttboj.wordpress.com/2014/01/08/automatically-deploying-glusterfs-with-puppet-gluster-vagrant/ >> This way you can try them out easily... > > You're close. I've got a test cluster up and running now, and I'm about > to go postal on it to see just in how many different ways I can break > it, and what I need to know to bring it back to life. "Go postal on it" -- I like this. Remember: if you break it, you get to keep both pieces! > >> For some of those points... solve them with... >>> Sort of a crib notes for things like: >>> >>> 1) What do you do if you see that a drive is about to fail? >>> >>> 2) What do you do if a drive has already failed? >> RAID6 > > Derp. Shoulda seen that one. Typically on iron, people with have been 2 and N different bricks, each composed of a RAID6 set. Other setups are possible depending on what kind of engineering you're doing. > >>> 3) What do you do if a peer is about to fail? >> Get a new peer ready... > > Here's what I think needs to happen, correct me if I've got this wrong: > 1) Set up a new host with gluster installed > 2) From the new host, probe one of the other peers (or from one of the > other peers, probe the new host) The pool has to probe the peer. Not the other way around... > 3) gluster volume replace-brick volname failing-host:/failing/brick > new-host:/new/brick start In latest gluster replace-brick is going away... Turning into add/remove brick... Try it out with a vagrant setup to get comfortable with it! > > Find out how it's going with: > gluster volume replace-brick volname failing-host:/failing/brick > new-host:/new/brick status > >>> 4) What do you do if a peer has failed? >> Replace with new peer... >> > > Same steps as (3) above, then: > 4) gluster volume heal volname > to begin copying data over from a replicant. > >>> 5) What do you do to reinstall a peer from scratch (i.e. what >>> configuration files/directories do you need to restore to get the host >>> back up and talking to the rest of the cluster)? >> Bring up a new peer. Add to cluster... Same as failed peer... >> > > >>> 6) What do you do with failed-heals? >>> 7) What do you do with split-brains? >> These are more complex issues and a number of people have written about them... >> Eg: http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ > > This covers split-brain, but what about failed-heal? Do you do the same > thing? Depends on what has happened... Look at the logs, see what's going on. Oh, make sure you aren't running out of disk space, because bad things could happen... :P > > Michael HTH James _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users