Re: Proposal to change locking in data-self-heal

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 22 May 2013 08:57:25 -0400 (EDT)

----- Original Message -----
> From: "Anand Avati" <anand.avati@xxxxxxxxx>
> To: "Jeff Darcy" <jdarcy@xxxxxxxxxx>
> Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Anand Avati" <aavati@xxxxxxxxxx>, "Raghavan Pichai"
> <rpichai@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, "devel" <gluster-devel@xxxxxxxxxx>
> Sent: Wednesday, May 22, 2013 1:19:19 AM
> Subject: Re: Proposal to change locking in data-self-heal
> 
> On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
> 
> > On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote:
> >
> >> scenario-1 won't happen because there exists a chance for it to acquire
> >> truncate's full file lock after any 128k range sync happens.
> >>
> >> Scenario-2 won't happen because extra self-heals that are launched on the
> >> same file will be blocked in self-heal-domain so the data-path's locks are
> >> not affected by this.
> >>
> >
> > This is two changes for two scenarios:
> >
> > * Change granular-lock order to avoid scenario 1.
> >
> > * Add a new lock to avoid scenario 2.
> >
> > I'm totally fine with the second part, but the first part makes me a bit
> > uneasy.  As I recall, the "chained" locking (lock second range before
> > unlocking the first) was a deliberate choice to avoid windows where
> > somebody could jump in with a truncate during self-heal.  It certainly
> > wasn't the obvious thing to do, suggesting there was a specific reason.
> 
> 
> Chained locking really was to avoid the start of a second self-heal while
> one is in progress. By giving up the big lock (after holding the small
> lock), regional operations can progress but the second self-heal waiting to
> hold the big lock is still held off. Truncating isn't really an issue as
> long as the truncate transaction locks from new EOF till infinity (which it
> is) and self-heal will not race in those regions. Of course, with chained
> locking, full-file truncates can starve.
> 
> It's not obvious what use case Pranith is facing where self-heal and
> ftruncate() are competing. Is it just an artificial/hand-crafted test case,
> or a real-life access pattern?

artificial.

> 
> 
> > Until we recall what that reason was I don't think we can determine
> > whether this is a bug or a feature.  If it's a feature, or if we don't
> > know, this change is likely to break something instead of fixing it.
> 
> 
> The sole reason was to prevent the start of a second self-heal. Having
> self-heals serialize in a separate domain solves the problem (except if we
> are looking at runtime compatibility across multiple versions in this
> scenario.. aargh!)

So you guys are OK with this proposal if we solve version compatibility issues?

> Avati