Re: Best Practices for different failure scenarios?

BGM <bernhard.glomm@xxxxxxxxxxx> · Wed, 19 Feb 2014 23:22:09 +0100

On 19.02.2014, at 21:15, James <purpleidea@xxxxxxxxx> wrote:

> On Wed, Feb 19, 2014 at 3:07 PM, Michael Peek <peek@xxxxxxxxxxx> wrote:
>> Is there a best practices document somewhere for how to handle standard
>> problems that crop up?
> 
> Short answer, it sounds like you'd benefit from playing with a test
> cluster... Would I be correct in guessing that you haven't setup a
> gluster pool yet?
> You might want to look at:
> https://ttboj.wordpress.com/2014/01/08/automatically-deploying-glusterfs-with-puppet-gluster-vagrant/
> This way you can try them out easily...
> For some of those points... solve them with...
> 
>> Sort of a crib notes for things like:
>> 
>> 1) What do you do if you see that a drive is about to fail?
> RAID6
or: zol, raidz<x>
(open for critical commends)
or: brick remove && brick add && volume heal
(it's really just three commands, at least in my experience so far, touch wood) 
.
but Michael, I appreciate your _original_ question:
"Is there a best practice document?"
Nope, not that I am aware of.
.
It might be very helpful to have a wiki next to this mailing list,
where all the good experience, all the proved solutions for "situations"
that are brought up here, could be gathered in a more
permanent and straight way.
.
To your questions I would add:
what's best practice in setting options for performance and/or integrity...
(yeah, well, for which use case under which conditions)
a mailinglist is very helpful for adhoc probs and questions,
but it would be nice to distill the knowledge into a permanent, searchable form.
.
sure anybody could set up a wiki, but...
it would need the acceptance and participation of an active group
to get best results.
so IMO the appropriate place would be somewhere close to gluster.org?
.
regards
Bernhard

> 
>> 2) What do you do if a drive has already failed?
> RAID6
> 
>> 3) What do you do if a peer is about to fail?
> Get a new peer ready...
> 
>> 4) What do you do if a peer has failed?
> Replace with new peer...
> 
>> 5) What do you do to reinstall a peer from scratch (i.e. what
>> configuration files/directories do you need to restore to get the host
>> back up and talking to the rest of the cluster)?
> Bring up a new peer. Add to cluster... Same as failed peer...
> 
>> 6) What do you do with failed-heals?
>> 7) What do you do with split-brains?
> These are more complex issues and a number of people have written about them...
> Eg: http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
> 
> Cheers,
> James
> 
> 
>> 
>> Michael
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users