Re: md-cache improvements

Dan Lambright <dlambrig@xxxxxxxxxx> · Wed, 17 Aug 2016 08:59:35 -0400 (EDT)

----- Original Message -----
> From: "Niels de Vos" <ndevos@xxxxxxxxxx>
> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>
> Cc: "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Csaba Henk"
> <csaba.henk@xxxxxxxxx>
> Sent: Wednesday, August 17, 2016 4:49:41 AM
> Subject: Re:  md-cache improvements
> 
> On Wed, Aug 17, 2016 at 11:42:25AM +0530, Raghavendra G wrote:
> > On Fri, Aug 12, 2016 at 10:29 AM, Raghavendra G <raghavendra@xxxxxxxxxxx>
> > wrote:
> > 
> > >
> > >
> > > On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghavendra@xxxxxxxxxxx>
> > > wrote:
> > >
> > >> Couple of more areas to explore:
> > >> 1. purging kernel dentry and/or page-cache too. Because of patch [1],
> > >> upcall notification can result in a call to inode_invalidate, which
> > >> results
> > >> in an "invalidate" notification to fuse kernel module. While I am sure
> > >> that, this notification will purge page-cache from kernel, I am not sure
> > >> about dentries. I assume if an inode is invalidated, it should result in
> > >> a
> > >> lookup (from kernel to glusterfs). But neverthless, we should look into
> > >> differences between entry_invalidation and inode_invalidation and
> > >> harness
> > >> them appropriately.
> 
> I do not think fuse handles upcall yet. I think there is a patch for
> that somewhere. It's been a while since I looked into that, but I think
> invalidating the affected dentries was straight forwards.

Can the patch # be tracked down ? I'd like to run some experiments with it + tiering..

> 
> > >> 2. Granularity of invalidation. For eg., We shouldn't be purging
> > >> page-cache in kernel, because of a change in xattr used by an xlator
> > >> (eg.,
> > >> dht layout xattr). We have to make sure that [1] is handling this. We
> > >> need
> > >> to add more granularity into invaldation (like internal xattr
> > >> invalidation,
> > >> user xattr invalidation, entry invalidation in kernel, page-cache
> > >> invalidation in kernel, attribute/stat invalidation in kernel etc) and
> > >> use
> > >> them judiciously, while making sure other cached data remains to be
> > >> present.
> > >>
> > >
> > > To stress the importance of this point, it should be noted that with tier
> > > there can be constant migration of files, which can result in spurious
> > > (from perspective of application) invalidations, even though application
> > > is
> > > not doing any writes on files [2][3][4]. Also, even if application is
> > > writing to file, there is no point in invalidating dentry cache. We
> > > should
> > > explore more ways to solve [2][3][4].
> 
> Actually upcall tracks the client/inode combination, and only sends
> upcall events to clients that (recently/timeout?) accessed the inode.
> There should not be any upcalls for inodes that the client did not
> access. So, when promotion/demotion happens, only the process doing this
> should receive the event, not any of the other clients that did not
> access the inode.
> 
> > > 3. We've a long standing issue of spurious termination of fuse
> > > invalidation thread. Since after termination, the thread is not
> > > re-spawned,
> > > we would not be able to purge kernel entry/attribute/page-cache. This
> > > issue
> > > was touched upon during a discussion [5], though we didn't solve the
> > > problem then for lack of bandwidth. Csaba has agreed to work on this
> > > issue.
> > >
> > 
> > 4. Flooding of network with upcall notifications. Is it a problem? If yes,
> > does upcall infra already solves it? Would NFS/SMB leases help here?
> 
> I guess some form of flooding is possible when two or more clients do
> many directory operations in the same directory. Hmm, now I wonder if a
> client gets an upcall event for something it did itself. I guess that
> would (most often?) not be needed.
> 
> Niels
> 
> 
> > 
> > 
> > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7
> > > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8
> > > [4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9
> > > [5] http://review.gluster.org/#/c/13274/1/xlators/mount/
> > > fuse/src/fuse-bridge.c
> > >
> > >
> > >>
> > >> [1] http://review.gluster.org/12951
> > >>
> > >>
> > >> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlambrig@xxxxxxxxxx>
> > >> wrote:
> > >>
> > >>>
> > >>> There have been recurring discussions within the gluster community to
> > >>> build on existing support for md-cache and upcalls to help performance
> > >>> for
> > >>> small file workloads. In certain cases, "lookup amplification"
> > >>> dominates
> > >>> data transfers, i.e. the cumulative round trip times of multiple
> > >>> LOOKUPs
> > >>> from the client mitigates benefits from faster backend storage.
> > >>>
> > >>> To tackle this problem, one suggestion is to more aggressively utilize
> > >>> md-cache to cache inodes on the client than is currently done. The
> > >>> inodes
> > >>> would be cached until they are invalidated by the server.
> > >>>
> > >>> Several gluster development engineers within the DHT, NFS, and Samba
> > >>> teams have been involved with related efforts, which have been underway
> > >>> for
> > >>> some time now. At this juncture, comments are requested from gluster
> > >>> developers.
> > >>>
> > >>> (1) .. help call out where additional upcalls would be needed to
> > >>> invalidate stale client cache entries (in particular, need feedback
> > >>> from
> > >>> DHT/AFR areas),
> > >>>
> > >>> (2) .. identify failure cases, when we cannot trust the contents of
> > >>> md-cache, e.g. when an upcall may have been dropped by the network
> > >>>
> > >>> (3) .. point out additional improvements which md-cache needs. For
> > >>> example, it cannot be allowed to grow unbounded.
> > >>>
> > >>> Dan
> > >>>
> > >>> ----- Original Message -----
> > >>> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> > >>> >
> > >>> > List of areas where we need invalidation notification:
> > >>> > 1. Any changes to xattrs used by xlators to store metadata (like dht
> > >>> layout
> > >>> > xattr, afr xattrs etc).
> > >>> > 2. Scenarios where individual xlator feels like it needs a lookup.
> > >>> > For
> > >>> > example failed directory creation on non-hashed subvol in dht during
> > >>> mkdir.
> > >>> > Though dht succeeds mkdir, it would be better to not cache this inode
> > >>> as a
> > >>> > subsequent lookup will heal the directory and make things better.
> > >>> > 3. removing of files
> > >>> > 4. writev on brick (to invalidate read cache on client)
> > >>> >
> > >>> > Other questions:
> > >>> > 5. Does md-cache has cache management? like lru or an upper limit for
> > >>> cache.
> > >>> > 6. Network disconnects and invalidating cache. When a network
> > >>> disconnect
> > >>> > happens we need to invalidate cache for inodes present on that brick
> > >>> as we
> > >>> > might be missing some notifications. Current approach of purging
> > >>> > cache
> > >>> of
> > >>> > all inodes might not be optimal as it might rollback benefits of
> > >>> caching.
> > >>> > Also, please note that network disconnects are not rare events.
> > >>> >
> > >>> > regards,
> > >>> > Raghavendra
> > >>> _______________________________________________
> > >>> Gluster-devel mailing list
> > >>> Gluster-devel@xxxxxxxxxxx
> > >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Raghavendra G
> > >>
> > >
> > >
> > >
> > > --
> > > Raghavendra G
> > >
> > 
> > 
> > 
> > --
> > Raghavendra G
> 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel