Hi guys, I have a cluster with replication (four machines, two drives in each) for testing that I've been beating on. I've just simulated one type of hardware failure by remounting a drive read-only. The manual covers many useful things: Adding/removing peers; Starting/stopping, creating, expanding, shrinking, and deleting volumes; etc. But it doesn't cover what you should do to replace a failed brick to minimize frustration and chances of data loss. I can't unmount the brick because glusterfs still has open files on it. If I stop the glusterfs-server then that takes the other brick in the machine out of commission too. I have the same problem if I reboot the machine -- I take the other brick out of service. What's the correct way to deal with this? Is there a way to tell gluster to take a brick out of commission for replacement without interrupting access to other bricks in the same machine? Thanks for your help, Michael Peek