+ gluster-devel ----- Original Message ----- > > > > > > Hi all, > > > > > > This mail is to consolidate three efforts that are in progress to fix > > > issues > > > around renamedir codepath in dht: > > > > > > 1. Transactions by Kotresh [1]- this makes renamedir atomic (barring > > > failures > > > and crash consistency issues) wrt ops like mkdir, lookup-heal, rmdir. > > > > Please note that transactions too are inadequate to address > > crash-consistency/snapshot related issues. > > > > > > > > 2. Rollback of renamedirs successfully completed on some subvols in case > > > of > > > failed renamedir - Csaba is working on this (patch yet to be posted). The > > > idea discussed involves dht_renamedir remembering result of renamedir > > > from > > > each subvol and rolling back the successful operations in case of > > > renamedir > > > failure. Note that this approach won't solve the issues with client > > > crashing > > > in the middle of a renamedir or issues with taking snapshots (after > > > restoring them) while a renamedir is in progress. > > > > > > 3. A proposal by Nithya to fail mkdir during directory self-heal > > > initiated > > > by > > > dht_lookup codepath. This will > > > 3a. Solve the race between a lookup(src)/lookup (dst) and rename (src, > > > dst) (as lookup won't be able to create src/dst). > > > 3b. Won't worsen the situation by messing up with gfid handles (on > > > backend) due to lookup heal creating either src or dst or both after a > > > failed renamedir. > > > > > > However solution 3 is a damage control and won't fix all things with a > > > failed renamedir. > > > > > > I think there is quite a bit of dependency among all the three > > > approaches. > > > > > > Problem 2 has dependency on 1 and 3 as: > > > 1. lookup heal could've already healed src/dst or both before we try to > > > roll-back > > > 2. transactions (by locking out lookup-heal) or proposal 3 (by failing > > > heal) > > > make sure that directory namespace is not tampered till a renamedir is > > > complete and hence paving way for rollback. > > > > > > Also we can build on top of 3 to recover from crashed renamedir or > > > restored > > > snapshots in lookup-heal (essentially solution 2, implemented in > > > lookup-heal > > > to either rollback/rollforward). My thoughts are below: > > > > > > Once transactions for entry operations corresponding to directory are in > > > place, lookup-selfheal will be able to identify a failed renamedir > > > operation > > > as: > > > > > > 1. It can figure out a gfid has been associated with more than one > > > directory. > > > For this, we need to make either mkdir during healing fail with EEXIST if > > > directory exists - Proposal 3 above (and possibly return the other path > > > associated with gfid) or do a lookup on gfid and fetch paths associated > > > with > > > gfid. > > > 2. No renamedir is in-progress (as we are in a transaction) and renamedir > > > is > > > the only operation (apart from mkdir and rmdir) that changes the > > > association > > > b/w a path and gfid for directories. > > > > > > Once we are able to identify a failed renamedir, we can possibly > > > rollback. > > > The ambiguous thing here is to figure out whether renamedir was a failure > > > (client crash scenario) or succeeded (snapshots). Since, for snapshots it > > > doesn't make a difference whether renamedir succeeded or failed, we can > > > always assume the case of failure and implement rollback. > > After today's meeting following are the problems with rollback after a crash > of client doing renamedir (or recovery of a snapshotted volume with > renamedir in progress): > > 1. Where to put recovery code? > The code has to be put in all places which modify the directory path i.e, > rmdir, renamedir and lookup-heal. The reason is another client might've > already issued a parallel operation and blocked on locks. The moment the > client with renamedir in-progress crashes, the other > rmdir/renamedir/lookup-heal would get the lock and proceed. So, all these > fops should be able to identify a crashed renamedir op and recover from > it. > > 2. How to identify src/dst (of crashed renamedir) for rollback? > Preferred way to store the src and dst on brick and use that information > for rollback. Proposal to see whether JBR helps. > > We decided not go ahead with providing crash consistency for renamedir given > the above complexity and also relative infrequency of the occurrence of this > issue. However, if snapshots become popular we may have to revisit the > problem. > > Other three efforts will be continued. > > > > > > > In nutshell 1 and 3 are two relatively independent changes which can be > > > leveraged by 2. > > > > > > Comments? > > > > > > [1] http://review.gluster.org/15472 > > > > > > regards, > > > Raghavendra > > > > > > > > > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel