Maybe a different approach could solve some of these problems and
improve responsiveness. It's an architectural change so I'm not sure if
it's the right moment to discuss it, but at least it could be considered
for the future. There are a lot of details to consider, so do not take
this as a full explanation, only a high lever overview.
The basic change is to implement a server-side healing helper (HH)
xlator living just under the lock xlator. It's purpose is not to heal
the file but to offer functionalities to aid client-side xlators to heal
a file.
When a client wants to heal a file, it will first send a request to the
HH xlator to request healing access. If the file is not being healed by
another client, the access will be granted. Once one client have
exclusive access to heal the file, a full inode lock will be needed to
heal the metadata at the beginning and the end of the heal process (just
like it's currently done). Then all locks are removed and the data
recovery can be made without any lock.
To be able to heal data without locks, the HH xlator needs to keep a
list of pending segments to heal. Initially the segment will go from
offset 0 to the file size (or something else defined by the client).
Since the HH xlator is below the lock xlator, it can only receive one
normal write and, possibly, one heal write at any moment. Normal writes
will always take precedence and the written segment will be removed from
the healing segments. Any heal write will be filtered by the pending
segments: if a heal write tries to modify an area not covered by the
pending segments, that area is not updated.
This strategy allows concurrent write operations with healing.
In this situation it's easy to handle a truncate request: the HH xlator
intercepts it and updates the pending segments, excluding any segment
starting at the truncate offset. If this results in an empty segment,
the HH xlator will tell the healing client that the healing is complete.
Al 21/05/13 15:58, En/na Jeff Darcy ha escrit:
On 05/21/2013 09:30 AM, Stephan von Krawczynski wrote:
I am not quite sure if I understood the issue in full detail. But are
you
saying that you "split up" the current self-healing file in 128K chunks
with locking/unlocking (over the network)? It sounds a bit like the
locking
takes more (cpu) time than the self-healing of the data itself. I
mean this
can be a 10 G link where a complete file could be healed in almost no
time,
even if the file is quite big. Sure WAN is different, but I really would
like to have at least an option to drop the partial locking
completely and
lock the full file instead.
That's actually how it used to work, which led to many complaints from
users who would see stalls accessing large files (most often VM
images) over GigE while self-heal was in progress. Many considered it
a show-stopper, and the current "granular self-heal" approach was
implemented to address it. I'm not sure whether the old behavior is
still available as an option. If not (which is what I suspect) then
you're correct that it might be worth considering as an enhancement.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel