----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Cc: "Nithya Balachandran" <nbalacha@xxxxxxxxxx>, "Csaba Henk" <chenk@xxxxxxxxxx>, "Kotresh Hiremath Ravishankar" > <khiremat@xxxxxxxxxx> > Sent: Thursday, November 3, 2016 9:51:42 AM > Subject: Re: dht renamedir transactions, failures and crash consistency > > + gluster-devel > > ----- Original Message ----- > > > > > > > > Hi all, > > > > > > > > This mail is to consolidate three efforts that are in progress to fix > > > > issues > > > > around renamedir codepath in dht: > > > > > > > > 1. Transactions by Kotresh [1]- this makes renamedir atomic (barring > > > > failures > > > > and crash consistency issues) wrt ops like mkdir, lookup-heal, rmdir. > > > > > > Please note that transactions too are inadequate to address > > > crash-consistency/snapshot related issues. > > > > > > > > > > > 2. Rollback of renamedirs successfully completed on some subvols in > > > > case > > > > of > > > > failed renamedir - Csaba is working on this (patch yet to be posted). Rollback patch for renamedir can be found at http://review.gluster.org/15739 > > > > The > > > > idea discussed involves dht_renamedir remembering result of renamedir > > > > from > > > > each subvol and rolling back the successful operations in case of > > > > renamedir > > > > failure. Note that this approach won't solve the issues with client > > > > crashing > > > > in the middle of a renamedir or issues with taking snapshots (after > > > > restoring them) while a renamedir is in progress. > > > > > > > > 3. A proposal by Nithya to fail mkdir during directory self-heal > > > > initiated > > > > by > > > > dht_lookup codepath. This will > > > > 3a. Solve the race between a lookup(src)/lookup (dst) and rename > > > > (src, > > > > dst) (as lookup won't be able to create src/dst). > > > > 3b. Won't worsen the situation by messing up with gfid handles (on > > > > backend) due to lookup heal creating either src or dst or both after > > > > a > > > > failed renamedir. > > > > > > > > However solution 3 is a damage control and won't fix all things with > > > > a > > > > failed renamedir. > > > > > > > > I think there is quite a bit of dependency among all the three > > > > approaches. > > > > > > > > Problem 2 has dependency on 1 and 3 as: > > > > 1. lookup heal could've already healed src/dst or both before we try to > > > > roll-back > > > > 2. transactions (by locking out lookup-heal) or proposal 3 (by failing > > > > heal) > > > > make sure that directory namespace is not tampered till a renamedir is > > > > complete and hence paving way for rollback. > > > > > > > > Also we can build on top of 3 to recover from crashed renamedir or > > > > restored > > > > snapshots in lookup-heal (essentially solution 2, implemented in > > > > lookup-heal > > > > to either rollback/rollforward). My thoughts are below: > > > > > > > > Once transactions for entry operations corresponding to directory are > > > > in > > > > place, lookup-selfheal will be able to identify a failed renamedir > > > > operation > > > > as: > > > > > > > > 1. It can figure out a gfid has been associated with more than one > > > > directory. > > > > For this, we need to make either mkdir during healing fail with EEXIST > > > > if > > > > directory exists - Proposal 3 above (and possibly return the other path > > > > associated with gfid) or do a lookup on gfid and fetch paths associated > > > > with > > > > gfid. > > > > 2. No renamedir is in-progress (as we are in a transaction) and > > > > renamedir > > > > is > > > > the only operation (apart from mkdir and rmdir) that changes the > > > > association > > > > b/w a path and gfid for directories. > > > > > > > > Once we are able to identify a failed renamedir, we can possibly > > > > rollback. > > > > The ambiguous thing here is to figure out whether renamedir was a > > > > failure > > > > (client crash scenario) or succeeded (snapshots). Since, for snapshots > > > > it > > > > doesn't make a difference whether renamedir succeeded or failed, we can > > > > always assume the case of failure and implement rollback. > > > > After today's meeting following are the problems with rollback after a > > crash > > of client doing renamedir (or recovery of a snapshotted volume with > > renamedir in progress): > > > > 1. Where to put recovery code? > > The code has to be put in all places which modify the directory path > > i.e, > > rmdir, renamedir and lookup-heal. The reason is another client might've > > already issued a parallel operation and blocked on locks. The moment the > > client with renamedir in-progress crashes, the other > > rmdir/renamedir/lookup-heal would get the lock and proceed. So, all > > these > > fops should be able to identify a crashed renamedir op and recover from > > it. > > > > 2. How to identify src/dst (of crashed renamedir) for rollback? > > Preferred way to store the src and dst on brick and use that information > > for rollback. Proposal to see whether JBR helps. > > > > We decided not go ahead with providing crash consistency for renamedir > > given > > the above complexity and also relative infrequency of the occurrence of > > this > > issue. However, if snapshots become popular we may have to revisit the > > problem. > > > > Other three efforts will be continued. > > > > > > > > > > In nutshell 1 and 3 are two relatively independent changes which can be > > > > leveraged by 2. > > > > > > > > Comments? > > > > > > > > [1] http://review.gluster.org/15472 > > > > > > > > regards, > > > > Raghavendra > > > > > > > > > > > > > > > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel