Christopher, main issue with self-heal is its complexity. Handling self-healing
logic in a non-blocking asynchronous code path is difficult. Replicating a missing
sounds simple, but holding off a lookup call and initiating a new series of calls
to heal the file and then resuming back normal operation is tricky. Much of the
bugs we faced in 1.3 is related to self-heal. We have handled most of these cases
over a period of time. Self-healing is decent now, but not good enough. We feel that
it has only complicated the code base. It is hard to test and maintain this part of
the code base.
Plan is to drop self-heal code all together once the active healing tool gets ready.
Unlike self-healing, this active healing can be run by the user on a mounted file system
(online) any time. By moving the code out of the file system, into a tool (that is
synchronous and linear), we can implement sophisticated healing techniques.
Code is not in the repository yet. Hopefully in a month, it will be ready for use.
You can simply turn off self-heal and run this utility while the file system is mounted.
List-hacking is an internal list, mostly junk :). It is an internal company list.
We don't discuss technical / architectural stuff there. They are mostly done over
phone and in-person meetings. We do want to actively involve the community right
from the design phase. Mailing list is cumbersome and slow to interactively
brainstorm design discussions. We can once in a while organize IRC sessions
for this purpose.
--
Anand Babu
Swank iest wrote:
Well,
I guess this is getting outside of the bug. I suppose you are going to
mark it as not going to fix?
I'm trying to put gluster into production right now, so may I ask:
1) What are the current issues with self-heal that require a full
re-write? Is there a place in the Wiki or elsewhere where it's being
documented?
2) May I see the new code? I must not be looking in the correct place
in TLA?
3) If it's not written yet, may I be included in the design discussion?
(As I haven't put gluster into production yet, now would be a good time
to know if it's not going to work in the near future.)
4) May I be placed on the list-hacking@xxxxxxxxxxxxx mailing list, please?
Christopher.
> Date: Mon, 5 Jan 2009 01:36:14 -0800
> From: ab@xxxxxxxxxxxxx
> To: krishna@xxxxxxxxxxxxx
> CC: swankier@xxxxxxx; list-hacking@xxxxxxxxxxxxx
> Subject: Re: [List-hacking] [bug #25207] an rm of a file should not
cause that file to be replicated with afr self-heal.
>
> Krishna, leave it as is. Once self-heal ensures that the volumes are
intact, rm will
> remove both the copies anyways. It is inefficient, but optimizing it
the current framework
> will be hacky.
>
> Swaniker, We are ditching the current self-healing framework with an
active healing tool.
> We can take care of it then.
>
>
> Krishna Srinivas wrote:
>> The current selfheal logic is built in lookup of a file, lookup is
>> issued just before any file operation on a file. So if the lookup call
>> does not know whether an open or rm is going to be done on the file.
>> Will get back to you if we can do anything about this, i.e to save the
>> redundant copy of the file when it is going to be rm'ed
>>
>> Krishna
>>
>> On Mon, Jan 5, 2009 at 12:19 PM, swankier <INVALID.NOREPLY@xxxxxxx>
wrote:
>>> Follow-up Comment #2, bug #25207 (project gluster):
>>>
>>> I am:
>>>
>>> 1) delete file from posix system beneath afr on one side
>>> 2) run rm on gluster file system
>>>
>>> file is then replicated followed by deletion
>>>
>>> _______________________________________________________
>>>
>>> Reply to this item at:
>>>
>>> <http://savannah.nongnu.org/bugs/?25207>
>
> --
> Anand Babu Periasamy
> GPG Key ID: 0x62E15A31
> Blog [http://ab.freeshell.org]
> GlusterFS [http://www.gluster.org]
> The GNU Operating System [http://www.gnu.org]
>
------------------------------------------------------------------------
Visit messengerbuddies.ca to find out how you could win. Enter today.
<http://www.messengerbuddies.ca/?ocid=BUDDYOMATICENCA20>
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
GlusterFS [http://www.gluster.org]
The GNU Operating System [http://www.gnu.org]