Re: md-cache improvements

Niels de Vos <ndevos@xxxxxxxxxx> · Wed, 17 Aug 2016 10:49:41 +0200

On Wed, Aug 17, 2016 at 11:42:25AM +0530, Raghavendra G wrote:
> On Fri, Aug 12, 2016 at 10:29 AM, Raghavendra G <raghavendra@xxxxxxxxxxx>
> wrote:
> 
> >
> >
> > On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghavendra@xxxxxxxxxxx>
> > wrote:
> >
> >> Couple of more areas to explore:
> >> 1. purging kernel dentry and/or page-cache too. Because of patch [1],
> >> upcall notification can result in a call to inode_invalidate, which results
> >> in an "invalidate" notification to fuse kernel module. While I am sure
> >> that, this notification will purge page-cache from kernel, I am not sure
> >> about dentries. I assume if an inode is invalidated, it should result in a
> >> lookup (from kernel to glusterfs). But neverthless, we should look into
> >> differences between entry_invalidation and inode_invalidation and harness
> >> them appropriately.

I do not think fuse handles upcall yet. I think there is a patch for
that somewhere. It's been a while since I looked into that, but I think
invalidating the affected dentries was straight forwards.

> >> 2. Granularity of invalidation. For eg., We shouldn't be purging
> >> page-cache in kernel, because of a change in xattr used by an xlator (eg.,
> >> dht layout xattr). We have to make sure that [1] is handling this. We need
> >> to add more granularity into invaldation (like internal xattr invalidation,
> >> user xattr invalidation, entry invalidation in kernel, page-cache
> >> invalidation in kernel, attribute/stat invalidation in kernel etc) and use
> >> them judiciously, while making sure other cached data remains to be present.
> >>
> >
> > To stress the importance of this point, it should be noted that with tier
> > there can be constant migration of files, which can result in spurious
> > (from perspective of application) invalidations, even though application is
> > not doing any writes on files [2][3][4]. Also, even if application is
> > writing to file, there is no point in invalidating dentry cache. We should
> > explore more ways to solve [2][3][4].

Actually upcall tracks the client/inode combination, and only sends
upcall events to clients that (recently/timeout?) accessed the inode.
There should not be any upcalls for inodes that the client did not
access. So, when promotion/demotion happens, only the process doing this
should receive the event, not any of the other clients that did not
access the inode.

> > 3. We've a long standing issue of spurious termination of fuse
> > invalidation thread. Since after termination, the thread is not re-spawned,
> > we would not be able to purge kernel entry/attribute/page-cache. This issue
> > was touched upon during a discussion [5], though we didn't solve the
> > problem then for lack of bandwidth. Csaba has agreed to work on this issue.
> >
> 
> 4. Flooding of network with upcall notifications. Is it a problem? If yes,
> does upcall infra already solves it? Would NFS/SMB leases help here?

I guess some form of flooding is possible when two or more clients do
many directory operations in the same directory. Hmm, now I wonder if a
client gets an upcall event for something it did itself. I guess that
would (most often?) not be needed.

Niels

> 
> 
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7
> > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8
> > [4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9
> > [5] http://review.gluster.org/#/c/13274/1/xlators/mount/
> > fuse/src/fuse-bridge.c
> >
> >
> >>
> >> [1] http://review.gluster.org/12951
> >>
> >>
> >> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlambrig@xxxxxxxxxx>
> >> wrote:
> >>
> >>>
> >>> There have been recurring discussions within the gluster community to
> >>> build on existing support for md-cache and upcalls to help performance for
> >>> small file workloads. In certain cases, "lookup amplification" dominates
> >>> data transfers, i.e. the cumulative round trip times of multiple LOOKUPs
> >>> from the client mitigates benefits from faster backend storage.
> >>>
> >>> To tackle this problem, one suggestion is to more aggressively utilize
> >>> md-cache to cache inodes on the client than is currently done. The inodes
> >>> would be cached until they are invalidated by the server.
> >>>
> >>> Several gluster development engineers within the DHT, NFS, and Samba
> >>> teams have been involved with related efforts, which have been underway for
> >>> some time now. At this juncture, comments are requested from gluster
> >>> developers.
> >>>
> >>> (1) .. help call out where additional upcalls would be needed to
> >>> invalidate stale client cache entries (in particular, need feedback from
> >>> DHT/AFR areas),
> >>>
> >>> (2) .. identify failure cases, when we cannot trust the contents of
> >>> md-cache, e.g. when an upcall may have been dropped by the network
> >>>
> >>> (3) .. point out additional improvements which md-cache needs. For
> >>> example, it cannot be allowed to grow unbounded.
> >>>
> >>> Dan
> >>>
> >>> ----- Original Message -----
> >>> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> >>> >
> >>> > List of areas where we need invalidation notification:
> >>> > 1. Any changes to xattrs used by xlators to store metadata (like dht
> >>> layout
> >>> > xattr, afr xattrs etc).
> >>> > 2. Scenarios where individual xlator feels like it needs a lookup. For
> >>> > example failed directory creation on non-hashed subvol in dht during
> >>> mkdir.
> >>> > Though dht succeeds mkdir, it would be better to not cache this inode
> >>> as a
> >>> > subsequent lookup will heal the directory and make things better.
> >>> > 3. removing of files
> >>> > 4. writev on brick (to invalidate read cache on client)
> >>> >
> >>> > Other questions:
> >>> > 5. Does md-cache has cache management? like lru or an upper limit for
> >>> cache.
> >>> > 6. Network disconnects and invalidating cache. When a network
> >>> disconnect
> >>> > happens we need to invalidate cache for inodes present on that brick
> >>> as we
> >>> > might be missing some notifications. Current approach of purging cache
> >>> of
> >>> > all inodes might not be optimal as it might rollback benefits of
> >>> caching.
> >>> > Also, please note that network disconnects are not rare events.
> >>> >
> >>> > regards,
> >>> > Raghavendra
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel@xxxxxxxxxxx
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>
> >>
> >>
> >>
> >> --
> >> Raghavendra G
> >>
> >
> >
> >
> > --
> > Raghavendra G
> >
> 
> 
> 
> -- 
> Raghavendra G

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel

Attachment:
signature.asc

Description: PGP signature
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel