Re: recovery

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Tue, 6 Mar 2007 09:27:14 +0530

Hi Christpher,

On 3/6/07, Christopher Hawkins <chawkins@xxxxxxxxxxxxxxxxxxxx> wrote:
The last fellow to post mentioned recovery... I have a question also: If I
had several storage servers and a number of clients accessing them, and I
were to lose a storage server, how best to bring it back online? I would be
using AFR to keep multiple copies of all files, so I know the cluster will
not lose data. But when the node goes down, does the AFR translator figure
out by itself that instead of the 3x copies I specified, there are now only
2x because I lost a storage node? Or does it only evaluate that at file
creation time?

AFR is nothing but implementation of open, read, write, getattr etc calls
It calls these functions on its children, if the child is down, the function
(from protocol/client) returns ENOTCONN to AFR which is ignored.
So AFR does not care if a child is down/up, it is up to the child translator
to pass on these calls to the servers if they are up.

And when I bring the storage node back, say it takes me two
days to fix it, I assume I should probably wipe the drives so as not to
introduce old copies of files that are now out of date (or does AFR update
them)? And the ALU scheduler will start using the blank space more heavily
for new writes, because it is preferred as "less used" and the storage use
will eventually even out again?

As of now we do not have any tool to get the new machine to be updated with
other AFR servers. It is on our task list.

Thanks for any answers!
Chris

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel