Re: Proposal to change locking in data-self-heal

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 23 May 2013 00:45:07 -0400 (EDT)

----- Original Message -----
> From: "Anand Avati" <anand.avati@xxxxxxxxx>
> To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> Cc: "Jeff Darcy" <jdarcy@xxxxxxxxxx>, "Anand Avati" <aavati@xxxxxxxxxx>, "Raghavan Pichai" <rpichai@xxxxxxxxxx>,
> "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, "devel" <gluster-devel@xxxxxxxxxx>
> Sent: Thursday, May 23, 2013 7:02:09 AM
> Subject: Re: Proposal to change locking in data-self-heal
> 
> On Wed, May 22, 2013 at 5:57 AM, Pranith Kumar Karampuri <
> pkarampu@xxxxxxxxxx> wrote:
> 
> >
> >
> > ----- Original Message -----
> > > From: "Anand Avati" <anand.avati@xxxxxxxxx>
> > > To: "Jeff Darcy" <jdarcy@xxxxxxxxxx>
> > > Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Anand Avati" <
> > aavati@xxxxxxxxxx>, "Raghavan Pichai"
> > > <rpichai@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>,
> > "devel" <gluster-devel@xxxxxxxxxx>
> > > Sent: Wednesday, May 22, 2013 1:19:19 AM
> > > Subject: Re: Proposal to change locking in data-self-heal
> > >
> > > On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
> > >
> > > > On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote:
> > > >
> > > >> scenario-1 won't happen because there exists a chance for it to
> > acquire
> > > >> truncate's full file lock after any 128k range sync happens.
> > > >>
> > > >> Scenario-2 won't happen because extra self-heals that are launched on
> > the
> > > >> same file will be blocked in self-heal-domain so the data-path's
> > locks are
> > > >> not affected by this.
> > > >>
> > > >
> > > > This is two changes for two scenarios:
> > > >
> > > > * Change granular-lock order to avoid scenario 1.
> > > >
> > > > * Add a new lock to avoid scenario 2.
> > > >
> > > > I'm totally fine with the second part, but the first part makes me a
> > bit
> > > > uneasy.  As I recall, the "chained" locking (lock second range before
> > > > unlocking the first) was a deliberate choice to avoid windows where
> > > > somebody could jump in with a truncate during self-heal.  It certainly
> > > > wasn't the obvious thing to do, suggesting there was a specific reason.
> > >
> > >
> > > Chained locking really was to avoid the start of a second self-heal while
> > > one is in progress. By giving up the big lock (after holding the small
> > > lock), regional operations can progress but the second self-heal waiting
> > to
> > > hold the big lock is still held off. Truncating isn't really an issue as
> > > long as the truncate transaction locks from new EOF till infinity (which
> > it
> > > is) and self-heal will not race in those regions. Of course, with chained
> > > locking, full-file truncates can starve.
> > >
> > > It's not obvious what use case Pranith is facing where self-heal and
> > > ftruncate() are competing. Is it just an artificial/hand-crafted test
> > case,
> > > or a real-life access pattern?
> >
> > artificial.
> >
> > >
> > >
> > > > Until we recall what that reason was I don't think we can determine
> > > > whether this is a bug or a feature.  If it's a feature, or if we don't
> > > > know, this change is likely to break something instead of fixing it.
> > >
> > >
> > > The sole reason was to prevent the start of a second self-heal. Having
> > > self-heals serialize in a separate domain solves the problem (except if
> > we
> > > are looking at runtime compatibility across multiple versions in this
> > > scenario.. aargh!)
> >
> > So you guys are OK with this proposal if we solve version compatibility
> > issues?
> >
> >
> I guess you can even just ignore the backward compatibility issue in this
> case. Old servers will "safely" block themselves against the big data-lock
> causing starvation just the way things work today. Once all servers are
> upgraded, all self-heals will gracefully block themselves in the new
> domain. There isn't much compatibility handling actually.

   If afr with new locking approach triggers self-heal first then old afr can still get
the full-file lock. This is because new locking approach unlocks the range-lock without acquiring
next range-lock. So 2 self-heals will be in progress in parallel. This will lead to incorrect
extended attributes because of double decrements of changelogs one by the new afr self-heal
and the other with old afr self-heal. Let me know if I am missing something.

   Since scenario-1(the one with truncate) is not widely seen, may be we should just solve
scenario-2 which can be done by acquiring a full-lock in self-heal domain then unlock after
completion of self-heal, without changing locking in data-domain. 

> 
> Avati
>