new user, failed brick question

a.focardi at interconnessioni.it (Alessio Focardi) · Mon, 12 Dec 2011 17:15:13 +0100 (CET)

Hi,

I'm a new user of glusterfs, experimenting the file system for a large scale research project. 

I was successful in creating a Distributed-Replicate volume, and performance seem to pretty amazing, still I have an unsolved question about bricks failure.

Let's say that in my setup I have 4 bricks (same size) over 4 servers forming a volume, with a replica of 2. Then a node goes down. 

First question: how do I easily detect the failure? Running "peer status" is reasonable for 4 nodes, but for 1.000?

I understand that the command can be scheduled and wrapped in a custom made alerting daemon, I was wondering if there was something that can be used "out of the box" for alerting purposes

Second question: a node has gone down. My filesystem is "degraded", is there an option to automatically start a "reduplication" of data on the bricks left or over a set of "spare bricks", while waiting for the failed brick to come up again?

This is in some way similar to automatic rebuild RAID implementations, to better clarify.

Thank you for any assistance you can provide!

ps

I apologize in advance if my questions have been answered before, I browsed the archive of the mailing list with no luck.

Alessio Focardi