ok, this does make sense. I wont ask for details on implementation but it does seem to make sense to have it as a separate process. At 09:27 PM 1/5/2009, Basavanagowda Kanur wrote: >HEAL tool will monitor the glusterfs in the same way AFR currently >does. The only difference being HEAL is a seperate process. >HEAL will contain all the functionalities of self-heal (inside AFR >as it exists today). > >On Mon, Jan 5, 2009 at 11:25 PM, Gordan Bobic ><<mailto:gordan at bobich.net>gordan at bobich.net> wrote: >Maybe I'm missing something here, but if you take self-healing out >of AFR, then surely that makes the system completely useless and no >better than running rsync every 5 minutes. Since that can't be >right, what am I missing? > >Gordan > > >Anand Babu Periasamy wrote: >Christopher, main issue with self-heal is its complexity. Handling >self-healing >logic in a non-blocking asynchronous code path is difficult. >Replicating a missing >sounds simple, but holding off a lookup call and initiating a new >series of calls >to heal the file and then resuming back normal operation is tricky. >Much of the >bugs we faced in 1.3 is related to self-heal. We have handled most >of these cases >over a period of time. Self-healing is decent now, but not good >enough. We feel that >it has only complicated the code base. It is hard to test and >maintain this part of >the code base. > >Plan is to drop self-heal code all together once the active healing >tool gets ready. >Unlike self-healing, this active healing can be run by the user on a >mounted file system >(online) any time. By moving the code out of the file system, into a >tool (that is >synchronous and linear), we can implement sophisticated healing techniques. > >Code is not in the repository yet. Hopefully in a month, it will be >ready for use. >You can simply turn off self-heal and run this utility while the >file system is mounted. > >List-hacking is an internal list, mostly junk :). It is an internal >company list. >We don't discuss technical / architectural stuff there. They are >mostly done over >phone and in-person meetings. We do want to actively involve the >community right >from the design phase. Mailing list is cumbersome and slow to interactively >brainstorm design discussions. We can once in a while organize IRC sessions >for this purpose. > >-- >Anand Babu > >Swank iest wrote: >Well, > >I guess this is getting outside of the bug. I suppose you are going >to mark it as not going to fix? > >I'm trying to put gluster into production right now, so may I ask: > >1) What are the current issues with self-heal that require a full >re-write? Is there a place in the Wiki or elsewhere where it's >being documented? >2) May I see the new code? I must not be looking in the correct place in TLA? >3) If it's not written yet, may I be included in the design >discussion? (As I haven't put gluster into production yet, now >would be a good time to know if it's not going to work in the near future.) >4) May I be placed on the ><mailto:list-hacking at zresearch.com>list-hacking at zresearch.com >mailing list, please? > > Christopher. > > > Date: Mon, 5 Jan 2009 01:36:14 -0800 > > From: <mailto:ab at zresearch.com>ab at zresearch.com > > To: <mailto:krishna at zresearch.com>krishna at zresearch.com > > CC: <mailto:swankier at msn.com>swankier at msn.com; > <mailto:list-hacking at zresearch.com>list-hacking at zresearch.com > > Subject: Re: [List-hacking] [bug #25207] an rm of a file should > not cause that file to be replicated with afr self-heal. > > > > Krishna, leave it as is. Once self-heal ensures that the volumes > are intact, rm will > > remove both the copies anyways. It is inefficient, but > optimizing it the current framework > > will be hacky. > > > > Swaniker, We are ditching the current self-healing framework > with an active healing tool. > > We can take care of it then. > > > > > > Krishna Srinivas wrote: > >> The current selfheal logic is built in lookup of a file, lookup is > >> issued just before any file operation on a file. So if the lookup call > >> does not know whether an open or rm is going to be done on the file. > >> Will get back to you if we can do anything about this, i.e to save the > >> redundant copy of the file when it is going to be rm'ed > >> > >> Krishna > >> > >> On Mon, Jan 5, 2009 at 12:19 PM, swankier > <<mailto:INVALID.NOREPLY at gnu.org>INVALID.NOREPLY at gnu.org> wrote: > >>> Follow-up Comment #2, bug #25207 (project gluster): > >>> > >>> I am: > >>> > >>> 1) delete file from posix system beneath afr on one side > >>> 2) run rm on gluster file system > >>> > >>> file is then replicated followed by deletion > > > >_______________________________________________ >Gluster-devel mailing list ><mailto:Gluster-devel at nongnu.org>Gluster-devel at nongnu.org >http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > >-- >gowda >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users