Daniel van Ham Colchete writes:
Krishna,
in all the messages I'm seeing here at the list I only see 2 AFR sub
volumes working. Is there any meaning is having 3 or more sub
volumes at the AFR translator? In Tibor's config, shouldn't the
translator create files at all three bricks? Shouldn't it use the
third brick at least when the seconds was offline?
It is incorrect to fail over to the next brick. If a brick is
down, when it comes back online, it will restored by the self-heal
functionality (1.4 version). AFR has to preserve the natural
order. Redundancy is already handled by the other online brick.
I know we will always have only half the total space available when
we use AFR making two copies of each file, but I think that the
advantage of distributing the file's copies over different servers,
like three in one AFR, is the fact that the failed server load
average will also be distributed over the other 3 servers, instead
of to just one that was designed to mirror the failed.
Load balancing is handled by Unify's Scheduler. You need to combine
Unify with AFR for this purpose.
The disadvantage of having 3 or more is that it's more complicated
get a server back on-line after one fails. I think it's more
complicated because when you have only 2 server it's easy know
exactly witch files should be copied and you can use a simple rsync
to copy them But, when you have 3 or more servers, you have to check
on every server to see witch files only have one copy of it. My
second question is: how will the AFR FSCK deal with this situations?
FSCK will be part of self-heal functionality in 1.4 version. Each
translator will implement an extra function to recover from
errors. When a brick goes down, it will keep a log of missed
operations and recover when the failed brick comes back online.
Recovering from 3 or more is handled this way. Directory listing from
all the three volumes are compared against each other. Missing ones
are synced.
Typically if we are maintaining a log, it is a seamless process to
recover. But if we want to force a thorough check, then the AFR's
check function will first compare and create the log of missing files
and then issue sync in parallel.
This is a 3 brick recovery example:
check I - 1 to 2 & 3:
1 2 3 1 2 3
--------------
A X X A A A
B X B -> B B B
X C C X C C
X X D X X D
check II - 2 to 3 & 1:
2 3 1 2 3 1
--------------
A A A A A A
B B B -> B B B
C C X C C C
X D X X D X
check III - 3 to 1 & 2:
3 1 2 3 1 2
--------------
A A A A A A
B B B -> B B B
C C C C C C
D X X D D D
--
Anand Babu
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]