Kosher admin practices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

I have a cluster with replication (four machines, two drives in each)
for testing that I've been beating on.  I've just simulated one type of
hardware failure by remounting a drive read-only.

The manual covers many useful things: Adding/removing peers;
Starting/stopping, creating, expanding, shrinking, and deleting volumes;
etc.  But it doesn't cover what you should do to replace a failed brick
to minimize frustration and chances of data loss.

I can't unmount the brick because glusterfs still has open files on it.

If I stop the glusterfs-server then that takes the other brick in the
machine out of commission too.

I have the same problem if I reboot the machine -- I take the other
brick out of service.

What's the correct way to deal with this?  Is there a way to tell
gluster to take a brick out of commission for replacement without
interrupting access to other bricks in the same machine?

Thanks for your help,

Michael Peek


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux