----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Cc: "Sakshi Bansal" <sabansal@xxxxxxxxxx> > Sent: Thursday, August 20, 2015 10:24:46 AM > Subject: Locking behavior vs rmdir/unlink of a directory/file > > Hi all, > > Most of the code currently treats inode table (and dentry structure > associated with that) as the correct representative of underlying backend > file-system. While this is correct for most of the cases, the representation > might be out of sync for small time-windows (like file deleted on disk, but > dentry and inode is not removed in our inode table etc). While working on > locking directories in dht for better consistency we ran into one such > issue. The issue is basically to make rmdir and directory creation during > dht-selfheal mutually exclusive. The idea is to have a blocking inodelk on > inode before proceeding with rmdir or directory self-heal. However, consider > following scenario: > > 1. (dht_)rmdir acquires a lock. > 2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired > by rmdir. > 3. rmdir deletes directory and unlocks the lock. Its possible for inode to > remain in inode table and searchable through gfid till there is a positive > reference count on it. In this case lock-request (by lookup) and > granted-lock (to rmdir) makes the inode to remain in inode table even after > rmdir. as both of them have a refcount each on inode. > 4. lock request issued by lookup is granted. > > Note that at step 4, its still possible rmdir might be in progress from dht > perspective (it just completed on one node). However, this is precisely the > situation we wanted to avoid i.e., we wanted to block and fail dht-selfheal > instead of allowing it to proceed. > > In this scenario at step 4, the directory is removed on backend file-system, > but its representation is still present in inode table. We tried to solve > this by doing a lookup on gfid before granting a lock [1]. However, because > of [1] > > 1. we no longer treat inode table as source of truth as opposed to other > non-lookup code > 2. performance hit in terms of a lookup on backend-filesystem for _every_ > granted lock. This may not be as big considering that there is no network > call involved. > > There are other ways where dht could've avoided above scenario altogether > with different trade-offs we didn't want to make. Few alternatives would've > been, > 1. use entrylk during lookup-selfheal and rmdir. This fits naturally as both > are entry operations. However, dht-selfheal also sets layouts which should > be synchronized other operations where we don't have name information. tl;dr > we wanted to avoid using entrylk for reasons that are out of scope for this > problem. > 2. Use non-blocking inodelk by dht during lookup-selfheal. This solves the > problem for most of the practical cases, but theoretically race can still > exist. > > To summarize, the problem of granted-locks and unlink/rmdir still remains and > I am not sure what exactly should be the behavior of posix-locks in that > scenario. Inputs in way of review on [1] are greatly appreciated. > > [1] http://review.gluster.org/#/c/11916/ > > regards, > Raghavendra. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel