Maybe I'm missing something here, but if you take self-healing out of AFR, then surely that makes the system completely useless and no better than running rsync every 5 minutes. Since that can't be right, what am I missing? Gordan Anand Babu Periasamy wrote: > Christopher, main issue with self-heal is its complexity. Handling > self-healing > logic in a non-blocking asynchronous code path is difficult. Replicating > a missing > sounds simple, but holding off a lookup call and initiating a new series > of calls > to heal the file and then resuming back normal operation is tricky. Much > of the > bugs we faced in 1.3 is related to self-heal. We have handled most of > these cases > over a period of time. Self-healing is decent now, but not good enough. > We feel that > it has only complicated the code base. It is hard to test and maintain > this part of > the code base. > > Plan is to drop self-heal code all together once the active healing tool > gets ready. > Unlike self-healing, this active healing can be run by the user on a > mounted file system > (online) any time. By moving the code out of the file system, into a > tool (that is > synchronous and linear), we can implement sophisticated healing techniques. > > Code is not in the repository yet. Hopefully in a month, it will be > ready for use. > You can simply turn off self-heal and run this utility while the file > system is mounted. > > List-hacking is an internal list, mostly junk :). It is an internal > company list. > We don't discuss technical / architectural stuff there. They are mostly > done over > phone and in-person meetings. We do want to actively involve the > community right > from the design phase. Mailing list is cumbersome and slow to interactively > brainstorm design discussions. We can once in a while organize IRC sessions > for this purpose. > > -- > Anand Babu > > Swank iest wrote: >> Well, >> >> I guess this is getting outside of the bug. I suppose you are going >> to mark it as not going to fix? >> >> I'm trying to put gluster into production right now, so may I ask: >> >> 1) What are the current issues with self-heal that require a full >> re-write? Is there a place in the Wiki or elsewhere where it's being >> documented? >> 2) May I see the new code? I must not be looking in the correct place >> in TLA? >> 3) If it's not written yet, may I be included in the design >> discussion? (As I haven't put gluster into production yet, now would >> be a good time to know if it's not going to work in the near future.) >> 4) May I be placed on the list-hacking at zresearch.com mailing list, >> please? >> >> Christopher. >> >> > Date: Mon, 5 Jan 2009 01:36:14 -0800 >> > From: ab at zresearch.com >> > To: krishna at zresearch.com >> > CC: swankier at msn.com; list-hacking at zresearch.com >> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not >> cause that file to be replicated with afr self-heal. >> > >> > Krishna, leave it as is. Once self-heal ensures that the volumes >> are intact, rm will >> > remove both the copies anyways. It is inefficient, but optimizing >> it the current framework >> > will be hacky. >> > >> > Swaniker, We are ditching the current self-healing framework with >> an active healing tool. >> > We can take care of it then. >> > >> > >> > Krishna Srinivas wrote: >> >> The current selfheal logic is built in lookup of a file, lookup is >> >> issued just before any file operation on a file. So if the lookup >> call >> >> does not know whether an open or rm is going to be done on the file. >> >> Will get back to you if we can do anything about this, i.e to save >> the >> >> redundant copy of the file when it is going to be rm'ed >> >> >> >> Krishna >> >> >> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier >> <INVALID.NOREPLY at gnu.org> wrote: >> >>> Follow-up Comment #2, bug #25207 (project gluster): >> >>> >> >>> I am: >> >>> >> >>> 1) delete file from posix system beneath afr on one side >> >>> 2) run rm on gluster file system >> >>> >> >>> file is then replicated followed by deletion