----- Original Message ----- > From: "Anand Avati" <anand.avati@xxxxxxxxx> > To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> > Cc: "Jeff Darcy" <jdarcy@xxxxxxxxxx>, "Anand Avati" <aavati@xxxxxxxxxx>, "Raghavan Pichai" <rpichai@xxxxxxxxxx>, > "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, "devel" <gluster-devel@xxxxxxxxxx> > Sent: Thursday, May 23, 2013 7:02:09 AM > Subject: Re: Proposal to change locking in data-self-heal > > On Wed, May 22, 2013 at 5:57 AM, Pranith Kumar Karampuri < > pkarampu@xxxxxxxxxx> wrote: > > > > > > > ----- Original Message ----- > > > From: "Anand Avati" <anand.avati@xxxxxxxxx> > > > To: "Jeff Darcy" <jdarcy@xxxxxxxxxx> > > > Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Anand Avati" < > > aavati@xxxxxxxxxx>, "Raghavan Pichai" > > > <rpichai@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, > > "devel" <gluster-devel@xxxxxxxxxx> > > > Sent: Wednesday, May 22, 2013 1:19:19 AM > > > Subject: Re: Proposal to change locking in data-self-heal > > > > > > On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: > > > > > > > On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote: > > > > > > > >> scenario-1 won't happen because there exists a chance for it to > > acquire > > > >> truncate's full file lock after any 128k range sync happens. > > > >> > > > >> Scenario-2 won't happen because extra self-heals that are launched on > > the > > > >> same file will be blocked in self-heal-domain so the data-path's > > locks are > > > >> not affected by this. > > > >> > > > > > > > > This is two changes for two scenarios: > > > > > > > > * Change granular-lock order to avoid scenario 1. > > > > > > > > * Add a new lock to avoid scenario 2. > > > > > > > > I'm totally fine with the second part, but the first part makes me a > > bit > > > > uneasy. As I recall, the "chained" locking (lock second range before > > > > unlocking the first) was a deliberate choice to avoid windows where > > > > somebody could jump in with a truncate during self-heal. It certainly > > > > wasn't the obvious thing to do, suggesting there was a specific reason. > > > > > > > > > Chained locking really was to avoid the start of a second self-heal while > > > one is in progress. By giving up the big lock (after holding the small > > > lock), regional operations can progress but the second self-heal waiting > > to > > > hold the big lock is still held off. Truncating isn't really an issue as > > > long as the truncate transaction locks from new EOF till infinity (which > > it > > > is) and self-heal will not race in those regions. Of course, with chained > > > locking, full-file truncates can starve. > > > > > > It's not obvious what use case Pranith is facing where self-heal and > > > ftruncate() are competing. Is it just an artificial/hand-crafted test > > case, > > > or a real-life access pattern? > > > > artificial. > > > > > > > > > > > > Until we recall what that reason was I don't think we can determine > > > > whether this is a bug or a feature. If it's a feature, or if we don't > > > > know, this change is likely to break something instead of fixing it. > > > > > > > > > The sole reason was to prevent the start of a second self-heal. Having > > > self-heals serialize in a separate domain solves the problem (except if > > we > > > are looking at runtime compatibility across multiple versions in this > > > scenario.. aargh!) > > > > So you guys are OK with this proposal if we solve version compatibility > > issues? > > > > > I guess you can even just ignore the backward compatibility issue in this > case. Old servers will "safely" block themselves against the big data-lock > causing starvation just the way things work today. Once all servers are > upgraded, all self-heals will gracefully block themselves in the new > domain. There isn't much compatibility handling actually. If afr with new locking approach triggers self-heal first then old afr can still get the full-file lock. This is because new locking approach unlocks the range-lock without acquiring next range-lock. So 2 self-heals will be in progress in parallel. This will lead to incorrect extended attributes because of double decrements of changelogs one by the new afr self-heal and the other with old afr self-heal. Let me know if I am missing something. Since scenario-1(the one with truncate) is not widely seen, may be we should just solve scenario-2 which can be done by acquiring a full-lock in self-heal domain then unlock after completion of self-heal, without changing locking in data-domain. > > Avati >