Hi all,
To come back to the issue of "self-healing" in an AFR situation.
Consider the rather complex situation below where A=AFR, S=stripe and
U=unify:
/---- Server1---\
Client3
S---- Server2 ----S
/
/-------U---<
>---U--------\
/ S---- Server3 ----S
\
/ \---- Server4 ---/
\
Client1 ------A--<
>---A------Client2
\ /---- Server5---\
/
\ S---- Server6 ----S
/
\-------U---<
>---U-------/
/ S---- Server7 ----S
Client4 \---- Server8 ---/
Client1 and 2: AFR of two separate unions of two separate stripes
Client3 and 4: Union of two separate stripes
I think this is quite a complex arrangement and could probably account
for 80% of large installation cases. The obvious question here is what
would the method of healing be for a server failure. Some thoughts:
1) As mentioned later on in this thread, the flexibility of gluster is
great but it is somewhat rediculouos to imagine that this flexibility
frees one from using good cluster design. For instance, the following
configuration is probably of little use, the clients must have a useful
configuration, possible like the larger one above:
/---Server1---\
Client1---S---< >---U---Client2
\---Server2---/
2) If a server is replaced, healing must take place from any or all
clients, otherwise the distributed nature of the system is lost.
3) No client should exist below a striping such as:
Client2
\ /---- Server1...
U---- Server2...
...Client1---S---<
U---- Server3....
\---- Server4....
Correct me if I'm wrong, but trying to read striped data as the above
drawing shows for client2 would not be very useful to client2.
4) A suggestion here is to have each AFR client with a self-heal
filter/translator. ONLY AFR clients should have self-healing for
replication. Other clients such as the union clients can have
self-healing filters but for different filesystem health checks. When a
server fails and is replaced, all AFR clients get stuck in and attempt
to reconstruct the data. Thus in this situation, Clients 1 and 2 will
heal the system. Clients 3 and 4 cannot because they don't have a full
set of data from which to work.
5) Who is the dominant reconstruction client? A simple possible
solution is to have a "pre-healing" lock for each file to be
reconstructed. For instance, Client1 finds "hello.c" in bad shape
because of the failure. Client1 placed a lock file in the directory
identifying itself with a timestamp. Client2 also notices that
"hello.c" is in bad shape and moves to fix, but notices a lock file with
a timestamp on it, and so will move on to another file/folder. If
Client2 notices that the timestamp has not been updated in 20s or
something reasonable, that means that Client1 has crashed or failed in
some manner and is no longer healing "hello.c". Therefore Client2 will
continue to heal "hello.c". Obviously, during healing, nothing else
should access the file for fear of further corruption. Comments on that
may run far, but so be it.
6) What it all comes down to is: 1) do not make the system's distributed
nature worthless; let all clients get stuck in as if they were all
trying to make breakfast. If someone is making the eggs, don't make
eggs, go make the toast. If the eggs start burning because the cook
went to the toilet, take over and finish the eggs. Soon enough, with
clever co-operation, the breakfast will be done.
Comments?
Regards,
Danson Joseph
Anand Avati wrote:
The concern here is the following though:
Two separate clients are identically configured to use AFR to two identical server configurations as follows:
Server1
/ \
Client1 --- ---Client2
\ /
Server2
Client1 puts "hello.c" onto both Server1 and Server2 via AFR. Client2 then changes hello.c in some way.
Server1 goes down; data lost, no chance of recovery and is replaced by Server3, a brand new server with fresh disks.
In this case, how does the data get reconstructed from the client's side because you mentioned that the automatic recovery was going to be on the glusterfs side. Client1 believes hello.c is something different to what Client2 believes. Which client will responsibly reconstruct the data? Will the journaling of the remaining servers be used to reconstruct the data on the new server?
'changes' are done in sync on both server1 and server2 always
(writes()s are sent to all child nodes). when server3 comes in place
of server1, the self-heal should detect that hello.c is missing on
server3 and sync it from server2.
regards,
avati