On 5/22/13 9:45 PM, Pranith Kumar Karampuri wrote:
----- Original Message -----
From: "Anand Avati" <anand.avati@xxxxxxxxx>
To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
Cc: "Jeff Darcy" <jdarcy@xxxxxxxxxx>, "Anand Avati" <aavati@xxxxxxxxxx>, "Raghavan Pichai" <rpichai@xxxxxxxxxx>,
"Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, "devel" <gluster-devel@xxxxxxxxxx>
Sent: Thursday, May 23, 2013 7:02:09 AM
Subject: Re: Proposal to change locking in data-self-heal
On Wed, May 22, 2013 at 5:57 AM, Pranith Kumar Karampuri <
pkarampu@xxxxxxxxxx> wrote:
----- Original Message -----
From: "Anand Avati" <anand.avati@xxxxxxxxx>
To: "Jeff Darcy" <jdarcy@xxxxxxxxxx>
Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Anand Avati" <
aavati@xxxxxxxxxx>, "Raghavan Pichai"
<rpichai@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>,
"devel" <gluster-devel@xxxxxxxxxx>
Sent: Wednesday, May 22, 2013 1:19:19 AM
Subject: Re: Proposal to change locking in data-self-heal
On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote:
scenario-1 won't happen because there exists a chance for it to
acquire
truncate's full file lock after any 128k range sync happens.
Scenario-2 won't happen because extra self-heals that are launched on
the
same file will be blocked in self-heal-domain so the data-path's
locks are
not affected by this.
This is two changes for two scenarios:
* Change granular-lock order to avoid scenario 1.
* Add a new lock to avoid scenario 2.
I'm totally fine with the second part, but the first part makes me a
bit
uneasy. As I recall, the "chained" locking (lock second range before
unlocking the first) was a deliberate choice to avoid windows where
somebody could jump in with a truncate during self-heal. It certainly
wasn't the obvious thing to do, suggesting there was a specific reason.
Chained locking really was to avoid the start of a second self-heal while
one is in progress. By giving up the big lock (after holding the small
lock), regional operations can progress but the second self-heal waiting
to
hold the big lock is still held off. Truncating isn't really an issue as
long as the truncate transaction locks from new EOF till infinity (which
it
is) and self-heal will not race in those regions. Of course, with chained
locking, full-file truncates can starve.
It's not obvious what use case Pranith is facing where self-heal and
ftruncate() are competing. Is it just an artificial/hand-crafted test
case,
or a real-life access pattern?
artificial.
Until we recall what that reason was I don't think we can determine
whether this is a bug or a feature. If it's a feature, or if we don't
know, this change is likely to break something instead of fixing it.
The sole reason was to prevent the start of a second self-heal. Having
self-heals serialize in a separate domain solves the problem (except if
we
are looking at runtime compatibility across multiple versions in this
scenario.. aargh!)
So you guys are OK with this proposal if we solve version compatibility
issues?
I guess you can even just ignore the backward compatibility issue in this
case. Old servers will "safely" block themselves against the big data-lock
causing starvation just the way things work today. Once all servers are
upgraded, all self-heals will gracefully block themselves in the new
domain. There isn't much compatibility handling actually.
If afr with new locking approach triggers self-heal first then old afr can still get
the full-file lock. This is because new locking approach unlocks the range-lock without acquiring
next range-lock. So 2 self-heals will be in progress in parallel. This will lead to incorrect
extended attributes because of double decrements of changelogs one by the new afr self-heal
and the other with old afr self-heal. Let me know if I am missing something.
Since scenario-1(the one with truncate) is not widely seen, may be we should just solve
scenario-2 which can be done by acquiring a full-lock in self-heal domain then unlock after
completion of self-heal, without changing locking in data-domain.
Yes, that's what I intended to say. Let's not introduce a backward
compatibility "problem" for an artificially created scenario.
Avati