On 05/22/2012 09:48 AM, Anand Avati wrote:
I must admit that I've read something about it but I haven't had time to explore it in detail. If I understand it correctly, the self-heal daemon works as a client process but can be executed on server nodes. I suppose that multiple self-heal daemons can be running on different nodes. Then, each daemon detects invalid files (not sure exactly how) and replicates the changes from one good node to the bad nodes. The problem is that in the translator I'm working on, the information is dispersed among multiple nodes, so there isn't a single server node that contains the whole data. To repair a node, data must be read from at least two other nodes (it depends on configuration). From what I've read from AFR and the self-healing daemon, it's not straightforward to adapt them to this mechanism because they would need to know a subset of nodes with consistent data, not only one. Each daemon would have to contact all other nodes, read data from each one, determine which ones are valid, rebuild the data and send it to the bad nodes. This means that the daemon will have to be as complex as the clients. My impression (but I may be wrong) is that AFR and the self-healing daemon are closely bound to the replication schema, so it is very hard to try to use them for other purposes. The healing translator I'm writing tries to offer generic server side helpers for the healing process, but it is the client side who really manages the healing operation (though heavily simplified) and could use it to replicate data, to disperse data, or some other schema. Xavi |