----- Original Message ----- > From: "Niels de Vos" <ndevos@xxxxxxxxxx> > To: "Vijay Bellur" <vbellur@xxxxxxxxxx> > Cc: "Poornima Gurusiddaiah" <pgurusid@xxxxxxxxxx>, "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Nithya Balachandran" > <nbalacha@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Soumya Koduri" <skoduri@xxxxxxxxxx>, "Pranith > Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Thursday, August 18, 2016 9:32:34 AM > Subject: Re: md-cache improvements > > On Mon, Aug 15, 2016 at 10:39:40PM -0400, Vijay Bellur wrote: > > Hi Poornima, Dan - > > > > Let us have a hangout/bluejeans session this week to discuss the planned > > md-cache improvements, proposed timelines and sort out open questions if > > any. > > > > Would 11:00 UTC on Wednesday work for everyone in the To: list? > > I'd appreciate it if someone could send the meeting minutes. It'll make > it easier to follow up and we can provide better status details on the > progress. Adding to this thread the tracking bug for the feature - 1211863 > > In any case, one of the points that Poornima mentioned was that upcall > events (when enabled) get cached in gfapi until the application handles > them. NFS-Ganesha is the only application that (currently) is interested > in these events. Other use-cases (like md-cache invalidation) would > enable upcalls too, and then cause event caching even when not needed. > > This change should address that, and I'm waiting for feedback on it. > There should be a bug report about these unneeded and uncleared caches, > but I could not find one... > > gfapi: do not cache upcalls if the application is not interested > http://review.gluster.org/15191 > > Thanks, > Niels > > > > > > Thanks, > > Vijay > > > > > > > > On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote: > > > > > > My comments inline. > > > > > > Regards, > > > Poornima > > > > > > ----- Original Message ----- > > > > From: "Dan Lambright" <dlambrig@xxxxxxxxxx> > > > > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > > > > Sent: Wednesday, August 10, 2016 10:35:58 PM > > > > Subject: md-cache improvements > > > > > > > > > > > > There have been recurring discussions within the gluster community to > > > > build > > > > on existing support for md-cache and upcalls to help performance for > > > > small > > > > file workloads. In certain cases, "lookup amplification" dominates data > > > > transfers, i.e. the cumulative round trip times of multiple LOOKUPs > > > > from the > > > > client mitigates benefits from faster backend storage. > > > > > > > > To tackle this problem, one suggestion is to more aggressively utilize > > > > md-cache to cache inodes on the client than is currently done. The > > > > inodes > > > > would be cached until they are invalidated by the server. > > > > > > > > Several gluster development engineers within the DHT, NFS, and Samba > > > > teams > > > > have been involved with related efforts, which have been underway for > > > > some > > > > time now. At this juncture, comments are requested from gluster > > > > developers. > > > > > > > > (1) .. help call out where additional upcalls would be needed to > > > > invalidate > > > > stale client cache entries (in particular, need feedback from DHT/AFR > > > > areas), > > > > > > > > (2) .. identify failure cases, when we cannot trust the contents of > > > > md-cache, > > > > e.g. when an upcall may have been dropped by the network > > > > > > Yes, this needs to be handled. > > > It can happen only when there is a one way disconnect, where the server > > > cannot > > > reach client and notify fails. We can have a retry for the same until the > > > cache > > > expiry time. > > > > > > > > > > > (3) .. point out additional improvements which md-cache needs. For > > > > example, > > > > it cannot be allowed to grow unbounded. > > > > > > This is being worked on, and will be targetted for 3.9 > > > > > > > > > > > Dan > > > > > > > > ----- Original Message ----- > > > > > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > > > > > > > > > > List of areas where we need invalidation notification: > > > > > 1. Any changes to xattrs used by xlators to store metadata (like dht > > > > > layout > > > > > xattr, afr xattrs etc). > > > > > > Currently, md-cache will negotiate(using ipc) with the brick, a list of > > > xattrs > > > that it needs invalidation for. Other xlators can add the xattrs they are > > > interested > > > in to the ipc. But then these xlators need to manage their own caching > > > and processing > > > the invalidation request, as md-cache will be above all cluater xlators. > > > reference: http://review.gluster.org/#/c/15002/ > > > > > > > > 2. Scenarios where individual xlator feels like it needs a lookup. > > > > > For > > > > > example failed directory creation on non-hashed subvol in dht during > > > > > mkdir. > > > > > Though dht succeeds mkdir, it would be better to not cache this inode > > > > > as a > > > > > subsequent lookup will heal the directory and make things better. > > > > > > For this, these xlators can specify an indicator in the dict of > > > the fop cbk, to not cache. This should be fairly simple to implement. > > > > > > > > 3. removing of files > > > > > > When an unlink is issued from the mount point, the cache is invalidated. > > > > > > > > 4. writev on brick (to invalidate read cache on client) > > > > > > writev on brick from any other client will invalidate the metadata cache > > > on all > > > the other clients. > > > > > > > > > > > > > Other questions: > > > > > 5. Does md-cache has cache management? like lru or an upper limit for > > > > > cache. > > > > > > Currently md-cache doesn't have any cache-management, we will be > > > targeting this > > > for 3.9 > > > > > > > > 6. Network disconnects and invalidating cache. When a network > > > > > disconnect > > > > > happens we need to invalidate cache for inodes present on that brick > > > > > as we > > > > > might be missing some notifications. Current approach of purging > > > > > cache of > > > > > all inodes might not be optimal as it might rollback benefits of > > > > > caching. > > > > > Also, please note that network disconnects are not rare events. > > > > > > Network disconnects are handled to a minimal extent, where any brick down > > > will > > > cause the whole of the cache to be invalidated. Invalidating only the > > > list of > > > inodes that belong to that perticular brick will need the support from > > > the > > > underlying cluster xlators. > > > > > > > > > > > > > regards, > > > > > Raghavendra > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxxx > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel