----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Cc: "Sakshi Bansal" <sabansal@xxxxxxxxxx> > Sent: Monday, 17 August, 2015 10:39:38 AM > Subject: Serialization of fops acting on same dentry on server > > All, > > Pranith and me were discussing about implementation of compound operations > like "create + lock", "mkdir + lock", "open + lock" etc. These operations > are useful in situations like: > > 1. To prevent locking on all subvols during directory creation as part of > self heal in dht. Currently we are following approach of locking _all_ > subvols by both rmdir and lookup-heal [1]. Correction. It should've been, "to prevent locking on all subvols during rmdir". The lookup self-heal should lock on all subvols (with compound "mkdir + lookup" if directory is not present on a subvol). With this rmdir/rename can lock on just any one subvol and this will prevent any parallel lookup-heal from preventing directory creation. > 2. To lock a file in advance so that there is less performance hit during > transactions in afr. > > While thinking about implementing such compound operations, it occurred to me > that one of the problems would be how do we handle a racing mkdir/create and > a (named lookup - simply referred as lookup from now on - followed by lock). > This is because, > 1. creation of directory/file on backend > 2. linking of the inode with the gfid corresponding to that file/directory > > are not atomic. It is not guaranteed that inode passed down during > mkdir/create call need not be the one that survives in inode table. Since > posix-locks xlator maintains all the lock-state in inode, it would be a > problem if a different inode is linked in inode table than the one passed > during mkdir/create. One way to solve this problem is to serialize fops > (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a > particular dentry. This serialization would also solve other bugs like: > > 1. issues solved by [2][3] and possibly many such issues. > 2. Stale dentries left out in bricks' inode table because of a racing lookup > and dentry modification ops (like rmdir, unlink, rename etc). > > Initial idea I've now is to maintain fops in-progress on a dentry in parent > inode (may be resolver code in protocol/server). Based on this we can > serialize the operations. Since we need to serialize _only_ operations on a > dentry (we don't serialize nameless lookups), it is guaranteed that we do > have a parent inode always. Any comments/discussion on this would be > appreciated. > > [1] http://review.gluster.org/11725 > [2] http://review.gluster.org/9913 > [3] http://review.gluster.org/5240 > > regards, > Raghavendra. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel