On Wed, Aug 17, 2016 at 08:59:35AM -0400, Dan Lambright wrote: > > > ----- Original Message ----- > > From: "Niels de Vos" <ndevos@xxxxxxxxxx> > > To: "Raghavendra G" <raghavendra@xxxxxxxxxxx> > > Cc: "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Csaba Henk" > > <csaba.henk@xxxxxxxxx> > > Sent: Wednesday, August 17, 2016 4:49:41 AM > > Subject: Re: md-cache improvements > > > > On Wed, Aug 17, 2016 at 11:42:25AM +0530, Raghavendra G wrote: > > > On Fri, Aug 12, 2016 at 10:29 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> > > > wrote: > > > > > > > > > > > > > > > On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> > > > > wrote: > > > > > > > >> Couple of more areas to explore: > > > >> 1. purging kernel dentry and/or page-cache too. Because of patch [1], > > > >> upcall notification can result in a call to inode_invalidate, which > > > >> results > > > >> in an "invalidate" notification to fuse kernel module. While I am sure > > > >> that, this notification will purge page-cache from kernel, I am not sure > > > >> about dentries. I assume if an inode is invalidated, it should result in > > > >> a > > > >> lookup (from kernel to glusterfs). But neverthless, we should look into > > > >> differences between entry_invalidation and inode_invalidation and > > > >> harness > > > >> them appropriately. > > > > I do not think fuse handles upcall yet. I think there is a patch for > > that somewhere. It's been a while since I looked into that, but I think > > invalidating the affected dentries was straight forwards. > > Can the patch # be tracked down ? I'd like to run some experiments with it + tiering.. I'll attach my attempt here. I dont remember the results of it though. Niels > > > > > > > >> 2. Granularity of invalidation. For eg., We shouldn't be purging > > > >> page-cache in kernel, because of a change in xattr used by an xlator > > > >> (eg., > > > >> dht layout xattr). We have to make sure that [1] is handling this. We > > > >> need > > > >> to add more granularity into invaldation (like internal xattr > > > >> invalidation, > > > >> user xattr invalidation, entry invalidation in kernel, page-cache > > > >> invalidation in kernel, attribute/stat invalidation in kernel etc) and > > > >> use > > > >> them judiciously, while making sure other cached data remains to be > > > >> present. > > > >> > > > > > > > > To stress the importance of this point, it should be noted that with tier > > > > there can be constant migration of files, which can result in spurious > > > > (from perspective of application) invalidations, even though application > > > > is > > > > not doing any writes on files [2][3][4]. Also, even if application is > > > > writing to file, there is no point in invalidating dentry cache. We > > > > should > > > > explore more ways to solve [2][3][4]. > > > > Actually upcall tracks the client/inode combination, and only sends > > upcall events to clients that (recently/timeout?) accessed the inode. > > There should not be any upcalls for inodes that the client did not > > access. So, when promotion/demotion happens, only the process doing this > > should receive the event, not any of the other clients that did not > > access the inode. > > > > > > 3. We've a long standing issue of spurious termination of fuse > > > > invalidation thread. Since after termination, the thread is not > > > > re-spawned, > > > > we would not be able to purge kernel entry/attribute/page-cache. This > > > > issue > > > > was touched upon during a discussion [5], though we didn't solve the > > > > problem then for lack of bandwidth. Csaba has agreed to work on this > > > > issue. > > > > > > > > > > 4. Flooding of network with upcall notifications. Is it a problem? If yes, > > > does upcall infra already solves it? Would NFS/SMB leases help here? > > > > I guess some form of flooding is possible when two or more clients do > > many directory operations in the same directory. Hmm, now I wonder if a > > client gets an upcall event for something it did itself. I guess that > > would (most often?) not be needed. > > > > Niels > > > > > > > > > > > > > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7 > > > > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8 > > > > [4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9 > > > > [5] http://review.gluster.org/#/c/13274/1/xlators/mount/ > > > > fuse/src/fuse-bridge.c > > > > > > > > > > > >> > > > >> [1] http://review.gluster.org/12951 > > > >> > > > >> > > > >> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlambrig@xxxxxxxxxx> > > > >> wrote: > > > >> > > > >>> > > > >>> There have been recurring discussions within the gluster community to > > > >>> build on existing support for md-cache and upcalls to help performance > > > >>> for > > > >>> small file workloads. In certain cases, "lookup amplification" > > > >>> dominates > > > >>> data transfers, i.e. the cumulative round trip times of multiple > > > >>> LOOKUPs > > > >>> from the client mitigates benefits from faster backend storage. > > > >>> > > > >>> To tackle this problem, one suggestion is to more aggressively utilize > > > >>> md-cache to cache inodes on the client than is currently done. The > > > >>> inodes > > > >>> would be cached until they are invalidated by the server. > > > >>> > > > >>> Several gluster development engineers within the DHT, NFS, and Samba > > > >>> teams have been involved with related efforts, which have been underway > > > >>> for > > > >>> some time now. At this juncture, comments are requested from gluster > > > >>> developers. > > > >>> > > > >>> (1) .. help call out where additional upcalls would be needed to > > > >>> invalidate stale client cache entries (in particular, need feedback > > > >>> from > > > >>> DHT/AFR areas), > > > >>> > > > >>> (2) .. identify failure cases, when we cannot trust the contents of > > > >>> md-cache, e.g. when an upcall may have been dropped by the network > > > >>> > > > >>> (3) .. point out additional improvements which md-cache needs. For > > > >>> example, it cannot be allowed to grow unbounded. > > > >>> > > > >>> Dan > > > >>> > > > >>> ----- Original Message ----- > > > >>> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > > > >>> > > > > >>> > List of areas where we need invalidation notification: > > > >>> > 1. Any changes to xattrs used by xlators to store metadata (like dht > > > >>> layout > > > >>> > xattr, afr xattrs etc). > > > >>> > 2. Scenarios where individual xlator feels like it needs a lookup. > > > >>> > For > > > >>> > example failed directory creation on non-hashed subvol in dht during > > > >>> mkdir. > > > >>> > Though dht succeeds mkdir, it would be better to not cache this inode > > > >>> as a > > > >>> > subsequent lookup will heal the directory and make things better. > > > >>> > 3. removing of files > > > >>> > 4. writev on brick (to invalidate read cache on client) > > > >>> > > > > >>> > Other questions: > > > >>> > 5. Does md-cache has cache management? like lru or an upper limit for > > > >>> cache. > > > >>> > 6. Network disconnects and invalidating cache. When a network > > > >>> disconnect > > > >>> > happens we need to invalidate cache for inodes present on that brick > > > >>> as we > > > >>> > might be missing some notifications. Current approach of purging > > > >>> > cache > > > >>> of > > > >>> > all inodes might not be optimal as it might rollback benefits of > > > >>> caching. > > > >>> > Also, please note that network disconnects are not rare events. > > > >>> > > > > >>> > regards, > > > >>> > Raghavendra > > > >>> _______________________________________________ > > > >>> Gluster-devel mailing list > > > >>> Gluster-devel@xxxxxxxxxxx > > > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Raghavendra G > > > >> > > > > > > > > > > > > > > > > -- > > > > Raghavendra G > > > > > > > > > > > > > > > > -- > > > Raghavendra G > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > >
From a58caf0dd20d7652a4d5cbc4318b93419cd89416 Mon Sep 17 00:00:00 2001 From: Niels de Vos <ndevos@xxxxxxxxxx> Date: Mon, 7 Dec 2015 17:40:27 +0100 Subject: [PATCH] fuse: basic upcall support for cache-invalidation Q: Should we make this configurable with a mount option? Q: Should we increase the timeout (1 sec currently) for inode caching? Change-Id: Ica521dfd0386f28bd96a126d96d50fdd4a29d49d Signed-off-by: Niels de Vos <ndevos@xxxxxxxxxx> --- xlators/mount/fuse/src/fuse-bridge.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c index 5d6979a..548c88a 100644 --- a/xlators/mount/fuse/src/fuse-bridge.c +++ b/xlators/mount/fuse/src/fuse-bridge.c @@ -16,6 +16,7 @@ #include "compat-errno.h" #include "glusterfs-acl.h" #include "syscall.h" +#include "upcall-utils.h" #ifdef __NetBSD__ #undef open /* in perfuse.h, pulled from mount-gluster-compat.h */ @@ -5203,6 +5204,24 @@ notify (xlator_t *this, int32_t event, void *data, ...) fini (this); break; } +#if FUSE_KERNEL_MINOR_VERSION >= 11 + case GF_EVENT_UPCALL: + { + /* At the moment we only care for cache-invalidation */ + struct gf_upcall *ue = data; + + if (ue->event_type == GF_UPCALL_CACHE_INVALIDATION) { + struct gf_upcall_cache_invalidation *uci = ue->data; + uint64_t fuse_ino = uci->stat.ia_ino; + + if (private->enable_ino32) + fuse_ino = GF_FUSE_SQUASH_INO (fuse_ino); + + fuse_invalidate_entry (this, fuse_ino); + } + break; + } +#endif default: break; -- 2.7.4
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel