Best practices?

B.Candler at pobox.com (Brian Candler) · Tue, 24 Jan 2012 22:06:29 +0000

> "Larry Bates" <larry.bates at vitalesafe.com> wrote on 01/24/2012 08:34:03 AM:
> > I'll admit to not understanding your response and would really
> > appreciate a little more explanation.  I only have two servers
> > with 8 x 2TB each in AFR-DHT so far, but we are growing and will
> > continue to do so basically forever.

I'm interested in experience of people using this model as well, preferably
on larger systems.

How do you find gluster handles individual drive failures? Is it possible to
mark a single disk/brick as down without downing its replica?  Do you need
to keep spare drive slots in the chassis so that you can replace-brick <dud>
<new> onto another drive in the same chassis?  In fact, does replace-brick
<dud> <new> even work if <dud> has died?

If you have a whole bunch of bricks sharing the same underlying disk (e.g.
/disk1/foo, /disk1/bar, /disk1/baz), then presumably you need to remember to
replace-brick every one onto the new drive?

I found
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
but this is about a whole server failing, not a single brick within one
server.  Clearly there is a glusterd uuid, but I'm not sure if each brick
also has a uuid.

Unfortunately I can't find any information about handling individual brick
failures in the General FAQ or the Technical FAQ either.  ISTM that if you
were relying on this as your sole method for handling drive failures, which
are bound to happen from time to time, you'd need to be well-drilled in
the procedure.

Many thanks,

Brian.