On Tue, Feb 28, 2017 at 11:21:55AM +0530, Mohammed Rafi K C wrote: > Hi All, > > > We discussed the problem $subject in the mail thread [1]. Based on the > comments and suggestions I will summarize the design (Made as points for > simplicity.) > > > 1) As part of each fop, top layer will generate a time stamp and pass it > to the down along with other param. > > 1.1) This will bring a dependency for NTP synced clients along with > servers What do you mean with "top layer"? Is this on the Gluster client, or does the time get inserted on the bricks? I think we should not require a hard dependency on NTP, but have it strongly suggested. Having a synced time in a clustered environment is always helpful for reading and matching logs. > 1.2) There can be a diff in time if the fop stuck in the xlator for > various reason, for ex: because of locks. Or just slow networks? Blocking (mandatory?) locks should be handled correctly. The time a FOP is blocked can be long. > 2) On the server posix layer stores the value in the memory (inode ctx) > and will sync the data periodically to the disk as an extended attr > > 2.1) of course sync call also will force it. And fop comes for an > inode which is not linked, we do the sync immediately. Does it need to be in the posix layer? > 3) Each time when inodes are created or initialized it read the data > from disk and store it. > > > 4) Before setting to inode_ctx we compare the timestamp stored and the > timestamp received, and only store if the stored value is lesser than > the current value. > > > 5) So in best case data will be stored and retrieved from the memory. We > replace the values in iatt with the values in inode_ctx. > > > 6) File ops that changes the parent directory attr time need to be > consistent across all the distributed directories across the subvolumes. > (for eg: a create call will change ctime and mtime of parent dir) > > 6.1) This has to handle separately because we only send the fop to > the hashed subvolume. > > 6.2) We can asynchronously send the timeupdate setattr fop to the > other subvoumes and change the values for parent directory if the file > fops is successful on hashed subvolume. > > 6.3) This will have a window where the times are inconsistent > across dht subvolume (Please provide your suggestions) Isn't this the same problem for 'normal' AFR volumes? I guess self-heal needs to know how to pick the right value for the [cm]time xattr. > 7) Currently we have couple of mount options for time attributes like > noatime, relatime , nodiratime etc. But we are not explicitly handled > those options even if it is given as mount option when gluster mount. [2] Where is the URL for [2]? > 7.1) We always relay on back end storage layer behavior, if you > have given those mount options when you mount your disk, you will get > this behaviour These options are for "not writing the atime", so if there is a client that does not use these options for mounting, the atime will be updated upon each access. Using these options on the brick-level, and not through fuse, nfs or smb would prevent it for all clients. Those are two use-cases, they probably need to be handled both in the future as well. > 7.2) Now if we are taking effort to fix the consistency issue, do > we need to honour those options by our own ? I do not think you need to handle them, and just rely on the filesystems (fuse, nfs and smb) to take care of it. However, check if Samba or NFS-Ganesha have config options for these, in that case, we might need to be able to tune it too. > Please provide your comments and suggestions. Please update https://bugzilla.redhat.com/show_bug.cgi?id=1318493 with your findings too. When this is fixed, caching solutions (like FS-Cache for NFS, SMB) will work much better. As mentioned in the BUG, we would be able to add a "birth time" attribute as well. Thanks, Niels > > > [1] : > http://lists.gluster.org/pipermail/gluster-devel/2016-January/048003.html > > > Regards > > Rafi KC > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-devel
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel