[Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.

gowda at zresearch.com (Basavanagowda Kanur) · Tue, 6 Jan 2009 10:57:22 +0530

HEAL tool will monitor the glusterfs in the same way AFR currently does. The
only difference being HEAL is a seperate process.
HEAL will contain all the functionalities of self-heal (inside AFR as it
exists today).

On Mon, Jan 5, 2009 at 11:25 PM, Gordan Bobic <gordan at bobich.net> wrote:

> Maybe I'm missing something here, but if you take self-healing out of AFR,
> then surely that makes the system completely useless and no better than
> running rsync every 5 minutes. Since that can't be right, what am I missing?
>
> Gordan
>
>
> Anand Babu Periasamy wrote:
>
>> Christopher, main issue with self-heal is its complexity. Handling
>> self-healing
>> logic in a non-blocking asynchronous code path is difficult. Replicating a
>> missing
>> sounds simple, but holding off a lookup call and initiating a new series
>> of calls
>> to heal the file and then resuming back normal operation is tricky. Much
>> of the
>> bugs we faced in 1.3 is related to self-heal. We have handled most of
>> these cases
>> over a period of time. Self-healing is decent now, but not good enough. We
>> feel that
>> it has only complicated the code base. It is hard to test and maintain
>> this part of
>> the code base.
>>
>> Plan is to drop self-heal code all together once the active healing tool
>> gets ready.
>> Unlike self-healing, this active healing can be run by the user on a
>> mounted file system
>> (online) any time. By moving the code out of the file system, into a tool
>> (that is
>> synchronous and linear), we can implement sophisticated healing
>> techniques.
>>
>> Code is not in the repository yet. Hopefully in a month, it will be ready
>> for use.
>> You can simply turn off self-heal and run this utility while the file
>> system is mounted.
>>
>> List-hacking is an internal list, mostly junk :). It is an internal
>> company list.
>> We don't discuss technical / architectural stuff there. They are mostly
>> done over
>> phone and in-person meetings. We do want to actively involve the community
>> right
>> from the design phase. Mailing list is cumbersome and slow to
>> interactively
>> brainstorm design discussions. We can once in a while organize IRC
>> sessions
>> for this purpose.
>>
>> --
>> Anand Babu
>>
>> Swank iest wrote:
>>
>>> Well,
>>>
>>> I guess this is getting outside of the bug.  I suppose you are going to
>>> mark it as not going to fix?
>>>
>>> I'm trying to put gluster into production right now, so may I ask:
>>>
>>> 1) What are the current issues with self-heal that require a full
>>> re-write?  Is there a place in the Wiki or elsewhere where it's being
>>> documented?
>>> 2) May I see the new code?  I must not be looking in the correct place in
>>> TLA?
>>> 3) If it's not written yet, may I be included in the design discussion?
>>>  (As I haven't put gluster into production yet, now would be a good time to
>>> know if it's not going to work in the near future.)
>>> 4) May I be placed on the list-hacking at zresearch.com mailing list,
>>> please?
>>>
>>>  Christopher.
>>>
>>>  > Date: Mon, 5 Jan 2009 01:36:14 -0800
>>>  > From: ab at zresearch.com
>>>  > To: krishna at zresearch.com
>>>  > CC: swankier at msn.com; list-hacking at zresearch.com
>>>  > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not
>>> cause that file to be replicated with afr self-heal.
>>>  >
>>>  > Krishna, leave it as is. Once self-heal ensures that the volumes are
>>> intact, rm will
>>>  > remove both the copies anyways. It is inefficient, but optimizing it
>>> the current framework
>>>  > will be hacky.
>>>  >
>>>  > Swaniker, We are ditching the current self-healing framework with an
>>> active healing tool.
>>>  > We can take care of it then.
>>>  >
>>>  >
>>>  > Krishna Srinivas wrote:
>>>  >> The current selfheal logic is built in lookup of a file, lookup is
>>>  >> issued just before any file operation on a file. So if the lookup
>>> call
>>>  >> does not know whether an open or rm is going to be done on the file.
>>>  >> Will get back to you if we can do anything about this, i.e to save
>>> the
>>>  >> redundant copy of the file when it is going to be rm'ed
>>>  >>
>>>  >> Krishna
>>>  >>
>>>  >> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY at gnu.org>
>>> wrote:
>>>  >>> Follow-up Comment #2, bug #25207 (project gluster):
>>>  >>>
>>>  >>> I am:
>>>  >>>
>>>  >>> 1) delete file from posix system beneath afr on one side
>>>  >>> 2) run rm on gluster file system
>>>  >>>
>>>  >>> file is then replicated followed by deletion
>>>
>>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
gowda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zresearch.com/pipermail/gluster-users/attachments/20090106/0910bfca/attachment.htm