[Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.

freedman at FreeFormIT.com (Keith Freedman) · Mon, 05 Jan 2009 22:22:42 -0800

ok, this does make sense.

I wont ask for details on implementation but it does seem to make 
sense to have it as a separate process.

At 09:27 PM 1/5/2009, Basavanagowda Kanur wrote:
>HEAL tool will monitor the glusterfs in the same way AFR currently 
>does. The only difference being HEAL is a seperate process.
>HEAL will contain all the functionalities of self-heal (inside AFR 
>as it exists today).
>
>On Mon, Jan 5, 2009 at 11:25 PM, Gordan Bobic 
><<mailto:gordan at bobich.net>gordan at bobich.net> wrote:
>Maybe I'm missing something here, but if you take self-healing out 
>of AFR, then surely that makes the system completely useless and no 
>better than running rsync every 5 minutes. Since that can't be 
>right, what am I missing?
>
>Gordan
>
>
>Anand Babu Periasamy wrote:
>Christopher, main issue with self-heal is its complexity. Handling 
>self-healing
>logic in a non-blocking asynchronous code path is difficult. 
>Replicating a missing
>sounds simple, but holding off a lookup call and initiating a new 
>series of calls
>to heal the file and then resuming back normal operation is tricky. 
>Much of the
>bugs we faced in 1.3 is related to self-heal. We have handled most 
>of these cases
>over a period of time. Self-healing is decent now, but not good 
>enough. We feel that
>it has only complicated the code base. It is hard to test and 
>maintain this part of
>the code base.
>
>Plan is to drop self-heal code all together once the active healing 
>tool gets ready.
>Unlike self-healing, this active healing can be run by the user on a 
>mounted file system
>(online) any time. By moving the code out of the file system, into a 
>tool (that is
>synchronous and linear), we can implement sophisticated healing techniques.
>
>Code is not in the repository yet. Hopefully in a month, it will be 
>ready for use.
>You can simply turn off self-heal and run this utility while the 
>file system is mounted.
>
>List-hacking is an internal list, mostly junk :). It is an internal 
>company list.
>We don't discuss technical / architectural stuff there. They are 
>mostly done over
>phone and in-person meetings. We do want to actively involve the 
>community right
>from the design phase. Mailing list is cumbersome and slow to interactively
>brainstorm design discussions. We can once in a while organize IRC sessions
>for this purpose.
>
>--
>Anand Babu
>
>Swank iest wrote:
>Well,
>
>I guess this is getting outside of the bug.  I suppose you are going 
>to mark it as not going to fix?
>
>I'm trying to put gluster into production right now, so may I ask:
>
>1) What are the current issues with self-heal that require a full 
>re-write?  Is there a place in the Wiki or elsewhere where it's 
>being documented?
>2) May I see the new code?  I must not be looking in the correct place in TLA?
>3) If it's not written yet, may I be included in the design 
>discussion?  (As I haven't put gluster into production yet, now 
>would be a good time to know if it's not going to work in the near future.)
>4) May I be placed on the 
><mailto:list-hacking at zresearch.com>list-hacking at zresearch.com 
>mailing list, please?
>
>  Christopher.
>
>  > Date: Mon, 5 Jan 2009 01:36:14 -0800
>  > From: <mailto:ab at zresearch.com>ab at zresearch.com
>  > To: <mailto:krishna at zresearch.com>krishna at zresearch.com
>  > CC: <mailto:swankier at msn.com>swankier at msn.com; 
> <mailto:list-hacking at zresearch.com>list-hacking at zresearch.com
>  > Subject: Re: [List-hacking] [bug #25207] an rm of a file should 
> not cause that file to be replicated with afr self-heal.
>  >
>  > Krishna, leave it as is. Once self-heal ensures that the volumes 
> are intact, rm will
>  > remove both the copies anyways. It is inefficient, but 
> optimizing it the current framework
>  > will be hacky.
>  >
>  > Swaniker, We are ditching the current self-healing framework 
> with an active healing tool.
>  > We can take care of it then.
>  >
>  >
>  > Krishna Srinivas wrote:
>  >> The current selfheal logic is built in lookup of a file, lookup is
>  >> issued just before any file operation on a file. So if the lookup call
>  >> does not know whether an open or rm is going to be done on the file.
>  >> Will get back to you if we can do anything about this, i.e to save the
>  >> redundant copy of the file when it is going to be rm'ed
>  >>
>  >> Krishna
>  >>
>  >> On Mon, Jan 5, 2009 at 12:19 PM, swankier 
> <<mailto:INVALID.NOREPLY at gnu.org>INVALID.NOREPLY at gnu.org> wrote:
>  >>> Follow-up Comment #2, bug #25207 (project gluster):
>  >>>
>  >>> I am:
>  >>>
>  >>> 1) delete file from posix system beneath afr on one side
>  >>> 2) run rm on gluster file system
>  >>>
>  >>> file is then replicated followed by deletion
>
>
>
>_______________________________________________
>Gluster-devel mailing list
><mailto:Gluster-devel at nongnu.org>Gluster-devel at nongnu.org
>http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
>--
>gowda
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users