On Wed, Aug 17, 2016 at 11:42:25AM +0530, Raghavendra G wrote: > On Fri, Aug 12, 2016 at 10:29 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> > wrote: > > > > > > > On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> > > wrote: > > > >> Couple of more areas to explore: > >> 1. purging kernel dentry and/or page-cache too. Because of patch [1], > >> upcall notification can result in a call to inode_invalidate, which results > >> in an "invalidate" notification to fuse kernel module. While I am sure > >> that, this notification will purge page-cache from kernel, I am not sure > >> about dentries. I assume if an inode is invalidated, it should result in a > >> lookup (from kernel to glusterfs). But neverthless, we should look into > >> differences between entry_invalidation and inode_invalidation and harness > >> them appropriately. I do not think fuse handles upcall yet. I think there is a patch for that somewhere. It's been a while since I looked into that, but I think invalidating the affected dentries was straight forwards. > >> 2. Granularity of invalidation. For eg., We shouldn't be purging > >> page-cache in kernel, because of a change in xattr used by an xlator (eg., > >> dht layout xattr). We have to make sure that [1] is handling this. We need > >> to add more granularity into invaldation (like internal xattr invalidation, > >> user xattr invalidation, entry invalidation in kernel, page-cache > >> invalidation in kernel, attribute/stat invalidation in kernel etc) and use > >> them judiciously, while making sure other cached data remains to be present. > >> > > > > To stress the importance of this point, it should be noted that with tier > > there can be constant migration of files, which can result in spurious > > (from perspective of application) invalidations, even though application is > > not doing any writes on files [2][3][4]. Also, even if application is > > writing to file, there is no point in invalidating dentry cache. We should > > explore more ways to solve [2][3][4]. Actually upcall tracks the client/inode combination, and only sends upcall events to clients that (recently/timeout?) accessed the inode. There should not be any upcalls for inodes that the client did not access. So, when promotion/demotion happens, only the process doing this should receive the event, not any of the other clients that did not access the inode. > > 3. We've a long standing issue of spurious termination of fuse > > invalidation thread. Since after termination, the thread is not re-spawned, > > we would not be able to purge kernel entry/attribute/page-cache. This issue > > was touched upon during a discussion [5], though we didn't solve the > > problem then for lack of bandwidth. Csaba has agreed to work on this issue. > > > > 4. Flooding of network with upcall notifications. Is it a problem? If yes, > does upcall infra already solves it? Would NFS/SMB leases help here? I guess some form of flooding is possible when two or more clients do many directory operations in the same directory. Hmm, now I wonder if a client gets an upcall event for something it did itself. I guess that would (most often?) not be needed. Niels > > > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7 > > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8 > > [4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9 > > [5] http://review.gluster.org/#/c/13274/1/xlators/mount/ > > fuse/src/fuse-bridge.c > > > > > >> > >> [1] http://review.gluster.org/12951 > >> > >> > >> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlambrig@xxxxxxxxxx> > >> wrote: > >> > >>> > >>> There have been recurring discussions within the gluster community to > >>> build on existing support for md-cache and upcalls to help performance for > >>> small file workloads. In certain cases, "lookup amplification" dominates > >>> data transfers, i.e. the cumulative round trip times of multiple LOOKUPs > >>> from the client mitigates benefits from faster backend storage. > >>> > >>> To tackle this problem, one suggestion is to more aggressively utilize > >>> md-cache to cache inodes on the client than is currently done. The inodes > >>> would be cached until they are invalidated by the server. > >>> > >>> Several gluster development engineers within the DHT, NFS, and Samba > >>> teams have been involved with related efforts, which have been underway for > >>> some time now. At this juncture, comments are requested from gluster > >>> developers. > >>> > >>> (1) .. help call out where additional upcalls would be needed to > >>> invalidate stale client cache entries (in particular, need feedback from > >>> DHT/AFR areas), > >>> > >>> (2) .. identify failure cases, when we cannot trust the contents of > >>> md-cache, e.g. when an upcall may have been dropped by the network > >>> > >>> (3) .. point out additional improvements which md-cache needs. For > >>> example, it cannot be allowed to grow unbounded. > >>> > >>> Dan > >>> > >>> ----- Original Message ----- > >>> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > >>> > > >>> > List of areas where we need invalidation notification: > >>> > 1. Any changes to xattrs used by xlators to store metadata (like dht > >>> layout > >>> > xattr, afr xattrs etc). > >>> > 2. Scenarios where individual xlator feels like it needs a lookup. For > >>> > example failed directory creation on non-hashed subvol in dht during > >>> mkdir. > >>> > Though dht succeeds mkdir, it would be better to not cache this inode > >>> as a > >>> > subsequent lookup will heal the directory and make things better. > >>> > 3. removing of files > >>> > 4. writev on brick (to invalidate read cache on client) > >>> > > >>> > Other questions: > >>> > 5. Does md-cache has cache management? like lru or an upper limit for > >>> cache. > >>> > 6. Network disconnects and invalidating cache. When a network > >>> disconnect > >>> > happens we need to invalidate cache for inodes present on that brick > >>> as we > >>> > might be missing some notifications. Current approach of purging cache > >>> of > >>> > all inodes might not be optimal as it might rollback benefits of > >>> caching. > >>> > Also, please note that network disconnects are not rare events. > >>> > > >>> > regards, > >>> > Raghavendra > >>> _______________________________________________ > >>> Gluster-devel mailing list > >>> Gluster-devel@xxxxxxxxxxx > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>> > >> > >> > >> > >> -- > >> Raghavendra G > >> > > > > > > > > -- > > Raghavendra G > > > > > > -- > Raghavendra G > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel