----- Original Message ----- > From: "Anand Avati" <anand.avati@xxxxxxxxx> > To: "Jeff Darcy" <jdarcy@xxxxxxxxxx> > Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Anand Avati" <aavati@xxxxxxxxxx>, "Raghavan Pichai" > <rpichai@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, "devel" <gluster-devel@xxxxxxxxxx> > Sent: Wednesday, May 22, 2013 1:19:19 AM > Subject: Re: Proposal to change locking in data-self-heal > > On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: > > > On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote: > > > >> scenario-1 won't happen because there exists a chance for it to acquire > >> truncate's full file lock after any 128k range sync happens. > >> > >> Scenario-2 won't happen because extra self-heals that are launched on the > >> same file will be blocked in self-heal-domain so the data-path's locks are > >> not affected by this. > >> > > > > This is two changes for two scenarios: > > > > * Change granular-lock order to avoid scenario 1. > > > > * Add a new lock to avoid scenario 2. > > > > I'm totally fine with the second part, but the first part makes me a bit > > uneasy. As I recall, the "chained" locking (lock second range before > > unlocking the first) was a deliberate choice to avoid windows where > > somebody could jump in with a truncate during self-heal. It certainly > > wasn't the obvious thing to do, suggesting there was a specific reason. > > > Chained locking really was to avoid the start of a second self-heal while > one is in progress. By giving up the big lock (after holding the small > lock), regional operations can progress but the second self-heal waiting to > hold the big lock is still held off. Truncating isn't really an issue as > long as the truncate transaction locks from new EOF till infinity (which it > is) and self-heal will not race in those regions. Of course, with chained > locking, full-file truncates can starve. > > It's not obvious what use case Pranith is facing where self-heal and > ftruncate() are competing. Is it just an artificial/hand-crafted test case, > or a real-life access pattern? artificial. > > > > Until we recall what that reason was I don't think we can determine > > whether this is a bug or a feature. If it's a feature, or if we don't > > know, this change is likely to break something instead of fixing it. > > > The sole reason was to prevent the start of a second self-heal. Having > self-heals serialize in a separate domain solves the problem (except if we > are looking at runtime compatibility across multiple versions in this > scenario.. aargh!) So you guys are OK with this proposal if we solve version compatibility issues? > Avati