Nope, we are just implementing a better approach to healing. BTW, We "afr" will be renamed to "replicate" (and still alias as AFR for backward compatibility). -- Anand Babu Keith Freedman wrote: > At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote: >> Christopher, main issue with self-heal is its complexity. Handling >> self-healing >> logic in a non-blocking asynchronous code path is difficult. >> Replicating a missing >> sounds simple, but holding off a lookup call and initiating a new >> series of calls >> to heal the file and then resuming back normal operation is tricky. >> Much of the >> bugs we faced in 1.3 is related to self-heal. We have handled most of >> these cases >> over a period of time. Self-healing is decent now, but not good >> enough. We feel that >> it has only complicated the code base. It is hard to test and maintain >> this part of >> the code base. >> >> Plan is to drop self-heal code all together once the active healing >> tool gets ready. >> Unlike self-healing, this active healing can be run by the user on a >> mounted file system >> (online) any time. By moving the code out of the file system, into a >> tool (that is >> synchronous and linear), we can implement sophisticated healing >> techniques. >> >> Code is not in the repository yet. Hopefully in a month, it will be >> ready for use. >> You can simply turn off self-heal and run this utility while the file >> system is mounted. > > I realize this is perhaps a bit premature, but am I to understand you'll > be doing away with auto self-healing in replicate? > this seems to eliminate much of the value of glusters AFR component. > if we have to manually heal with some tool, there's always a risk of a > data integrity problem while this healing process is being excuted after > a server interruption. > > if it's going to be optional to turn on/off, that's fine, I suppose, but > please, if you're considering removing this feature altogether, > reconsider. Unless this active healing tol is something that would be > run automatically anytime there's a disconnect between AFR servers. > > While I certainly do realize that the self-heal code is a HUGE > performance issue as it's currently written (at least that's what I'm > noticing on my servers), it's function is necessary to make the AFR > useful. >