[Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.

gordan at bobich.net (Gordan Bobic) · Mon, 05 Jan 2009 17:55:42 +0000

Maybe I'm missing something here, but if you take self-healing out of 
AFR, then surely that makes the system completely useless and no better 
than running rsync every 5 minutes. Since that can't be right, what am I 
missing?

Gordan

Anand Babu Periasamy wrote:
> Christopher, main issue with self-heal is its complexity. Handling 
> self-healing
> logic in a non-blocking asynchronous code path is difficult. Replicating 
> a missing
> sounds simple, but holding off a lookup call and initiating a new series 
> of calls
> to heal the file and then resuming back normal operation is tricky. Much 
> of the
> bugs we faced in 1.3 is related to self-heal. We have handled most of 
> these cases
> over a period of time. Self-healing is decent now, but not good enough. 
> We feel that
> it has only complicated the code base. It is hard to test and maintain 
> this part of
> the code base.
> 
> Plan is to drop self-heal code all together once the active healing tool 
> gets ready.
> Unlike self-healing, this active healing can be run by the user on a 
> mounted file system
> (online) any time. By moving the code out of the file system, into a 
> tool (that is
> synchronous and linear), we can implement sophisticated healing techniques.
> 
> Code is not in the repository yet. Hopefully in a month, it will be 
> ready for use.
> You can simply turn off self-heal and run this utility while the file 
> system is mounted.
> 
> List-hacking is an internal list, mostly junk :). It is an internal 
> company list.
> We don't discuss technical / architectural stuff there. They are mostly 
> done over
> phone and in-person meetings. We do want to actively involve the 
> community right
> from the design phase. Mailing list is cumbersome and slow to interactively
> brainstorm design discussions. We can once in a while organize IRC sessions
> for this purpose.
> 
> -- 
> Anand Babu
> 
> Swank iest wrote:
>> Well,
>>
>> I guess this is getting outside of the bug.  I suppose you are going 
>> to mark it as not going to fix?
>>
>> I'm trying to put gluster into production right now, so may I ask:
>>
>> 1) What are the current issues with self-heal that require a full 
>> re-write?  Is there a place in the Wiki or elsewhere where it's being 
>> documented?
>> 2) May I see the new code?  I must not be looking in the correct place 
>> in TLA?
>> 3) If it's not written yet, may I be included in the design 
>> discussion?  (As I haven't put gluster into production yet, now would 
>> be a good time to know if it's not going to work in the near future.)
>> 4) May I be placed on the list-hacking at zresearch.com mailing list, 
>> please?
>>
>>  Christopher.
>>
>>  > Date: Mon, 5 Jan 2009 01:36:14 -0800
>>  > From: ab at zresearch.com
>>  > To: krishna at zresearch.com
>>  > CC: swankier at msn.com; list-hacking at zresearch.com
>>  > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not 
>> cause that file to be replicated with afr self-heal.
>>  >
>>  > Krishna, leave it as is. Once self-heal ensures that the volumes 
>> are intact, rm will
>>  > remove both the copies anyways. It is inefficient, but optimizing 
>> it the current framework
>>  > will be hacky.
>>  >
>>  > Swaniker, We are ditching the current self-healing framework with 
>> an active healing tool.
>>  > We can take care of it then.
>>  >
>>  >
>>  > Krishna Srinivas wrote:
>>  >> The current selfheal logic is built in lookup of a file, lookup is
>>  >> issued just before any file operation on a file. So if the lookup 
>> call
>>  >> does not know whether an open or rm is going to be done on the file.
>>  >> Will get back to you if we can do anything about this, i.e to save 
>> the
>>  >> redundant copy of the file when it is going to be rm'ed
>>  >>
>>  >> Krishna
>>  >>
>>  >> On Mon, Jan 5, 2009 at 12:19 PM, swankier 
>> <INVALID.NOREPLY at gnu.org> wrote:
>>  >>> Follow-up Comment #2, bug #25207 (project gluster):
>>  >>>
>>  >>> I am:
>>  >>>
>>  >>> 1) delete file from posix system beneath afr on one side
>>  >>> 2) run rm on gluster file system
>>  >>>
>>  >>> file is then replicated followed by deletion