On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote: > All, > > Pranith and me were discussing about implementation of compound > operations like "create + lock", "mkdir + lock", "open + lock" etc. > These operations are useful in situations like: > > 1. To prevent locking on all subvols during directory creation as part > of self heal in dht. Currently we are following approach of locking > _all_ subvols by both rmdir and lookup-heal [1]. > 2. To lock a file in advance so that there is less performance hit > during transactions in afr. I have an interest in compound/composite procedures too. My use-case is a little different, and I (was and still) am planning to send more details about it soon. Basically, there are certain cases where libgfapi will not be able to automatically pass the uid/gid in the RPC-header. A design for supporting Kerberos will mainly use the standardized RPCSEC_GSS. If there is no option to use the Kerberos credentials of the user doing I/O (remote client, not using Kerberos to talk to samba/ganesha), the username (or uid/gid) needs to be passed to the storage servers. A compound/composite procedure would then look like this: [RPC header] [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...] [GlusterFS COMPOUND] [SETFSUID] [SETLOCKOWNER] [${FOP}] [.. more FOPs?] This idea has not been reviewed/commented on with some of the Kerberos experts that I want to involve. A more complete description about the plans to support Kerberos will follow. Do you think that this matches your ideas on compound operations? Thanks, Niels > > While thinking about implementing such compound operations, it > occurred to me that one of the problems would be how do we handle a > racing mkdir/create and a (named lookup - simply referred as lookup > from now on - followed by lock). This is because, > 1. creation of directory/file on backend > 2. linking of the inode with the gfid corresponding to that > file/directory > > are not atomic. It is not guaranteed that inode passed down during > mkdir/create call need not be the one that survives in inode table. > Since posix-locks xlator maintains all the lock-state in inode, it > would be a problem if a different inode is linked in inode table than > the one passed during mkdir/create. One way to solve this problem is > to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) > that are happening on a particular dentry. This serialization would > also solve other bugs like: > > 1. issues solved by [2][3] and possibly many such issues. > 2. Stale dentries left out in bricks' inode table because of a racing > lookup and dentry modification ops (like rmdir, unlink, rename etc). > > Initial idea I've now is to maintain fops in-progress on a dentry in > parent inode (may be resolver code in protocol/server). Based on this > we can serialize the operations. Since we need to serialize _only_ > operations on a dentry (we don't serialize nameless lookups), it is > guaranteed that we do have a parent inode always. Any > comments/discussion on this would be appreciated. > > [1] http://review.gluster.org/11725 > [2] http://review.gluster.org/9913 > [3] http://review.gluster.org/5240 > > regards, > Raghavendra. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel
Attachment:
pgpCs2mYyWT1v.pgp
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel