Re: md-cache improvements

Dan Lambright <dlambrig@xxxxxxxxxx> · Thu, 18 Aug 2016 11:51:25 -0400 (EDT)



----- Original Message -----
> From: "Niels de Vos" <ndevos@xxxxxxxxxx>
> To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> Cc: "Poornima Gurusiddaiah" <pgurusid@xxxxxxxxxx>, "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Nithya Balachandran"
> <nbalacha@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Soumya Koduri" <skoduri@xxxxxxxxxx>, "Pranith
> Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Thursday, August 18, 2016 9:32:34 AM
> Subject: Re:  md-cache improvements
> 
> On Mon, Aug 15, 2016 at 10:39:40PM -0400, Vijay Bellur wrote:
> > Hi Poornima, Dan -
> > 
> > Let us have a hangout/bluejeans session this week to discuss the planned
> > md-cache improvements, proposed timelines and sort out open questions if
> > any.
> > 
> > Would 11:00 UTC on Wednesday work for everyone in the To: list?
> 
> I'd appreciate it if someone could send the meeting minutes. It'll make
> it easier to follow up and we can provide better status details on the
> progress.

Adding to this thread the tracking bug for the feature - 1211863


> 
> In any case, one of the points that Poornima mentioned was that upcall
> events (when enabled) get cached in gfapi until the application handles
> them. NFS-Ganesha is the only application that (currently) is interested
> in these events. Other use-cases (like md-cache invalidation) would
> enable upcalls too, and then cause event caching even when not needed.
> 
> This change should address that, and I'm waiting for feedback on it.
> There should be a bug report about these unneeded and uncleared caches,
> but I could not find one...
> 
>   gfapi: do not cache upcalls if the application is not interested
>   http://review.gluster.org/15191
> 
> Thanks,
> Niels
> 
> 
> > 
> > Thanks,
> > Vijay
> > 
> > 
> > 
> > On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote:
> > > 
> > > My comments inline.
> > > 
> > > Regards,
> > > Poornima
> > > 
> > > ----- Original Message -----
> > > > From: "Dan Lambright" <dlambrig@xxxxxxxxxx>
> > > > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > > Sent: Wednesday, August 10, 2016 10:35:58 PM
> > > > Subject:  md-cache improvements
> > > > 
> > > > 
> > > > There have been recurring discussions within the gluster community to
> > > > build
> > > > on existing support for md-cache and upcalls to help performance for
> > > > small
> > > > file workloads. In certain cases, "lookup amplification" dominates data
> > > > transfers, i.e. the cumulative round trip times of multiple LOOKUPs
> > > > from the
> > > > client mitigates benefits from faster backend storage.
> > > > 
> > > > To tackle this problem, one suggestion is to more aggressively utilize
> > > > md-cache to cache inodes on the client than is currently done. The
> > > > inodes
> > > > would be cached until they are invalidated by the server.
> > > > 
> > > > Several gluster development engineers within the DHT, NFS, and Samba
> > > > teams
> > > > have been involved with related efforts, which have been underway for
> > > > some
> > > > time now. At this juncture, comments are requested from gluster
> > > > developers.
> > > > 
> > > > (1) .. help call out where additional upcalls would be needed to
> > > > invalidate
> > > > stale client cache entries (in particular, need feedback from DHT/AFR
> > > > areas),
> > > > 
> > > > (2) .. identify failure cases, when we cannot trust the contents of
> > > > md-cache,
> > > > e.g. when an upcall may have been dropped by the network
> > > 
> > > Yes, this needs to be handled.
> > > It can happen only when there is a one way disconnect, where the server
> > > cannot
> > > reach client and notify fails. We can have a retry for the same until the
> > > cache
> > > expiry time.
> > > 
> > > > 
> > > > (3) .. point out additional improvements which md-cache needs. For
> > > > example,
> > > > it cannot be allowed to grow unbounded.
> > > 
> > > This is being worked on, and will be targetted for 3.9
> > > 
> > > > 
> > > > Dan
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> > > > > 
> > > > > List of areas where we need invalidation notification:
> > > > > 1. Any changes to xattrs used by xlators to store metadata (like dht
> > > > > layout
> > > > > xattr, afr xattrs etc).
> > > 
> > > Currently, md-cache will negotiate(using ipc) with the brick, a list of
> > > xattrs
> > > that it needs invalidation for. Other xlators can add the xattrs they are
> > > interested
> > > in to the ipc. But then these xlators need to manage their own caching
> > > and processing
> > > the invalidation request, as md-cache will be above all cluater xlators.
> > > reference: http://review.gluster.org/#/c/15002/
> > > 
> > > > > 2. Scenarios where individual xlator feels like it needs a lookup.
> > > > > For
> > > > > example failed directory creation on non-hashed subvol in dht during
> > > > > mkdir.
> > > > > Though dht succeeds mkdir, it would be better to not cache this inode
> > > > > as a
> > > > > subsequent lookup will heal the directory and make things better.
> > > 
> > > For this, these xlators can specify an indicator in the dict of
> > > the fop cbk, to not cache. This should be fairly simple to implement.
> > > 
> > > > > 3. removing of files
> > > 
> > > When an unlink is issued from the mount point, the cache is invalidated.
> > > 
> > > > > 4. writev on brick (to invalidate read cache on client)
> > > 
> > > writev on brick from any other client will invalidate the metadata cache
> > > on all
> > > the other clients.
> > > 
> > > > > 
> > > > > Other questions:
> > > > > 5. Does md-cache has cache management? like lru or an upper limit for
> > > > > cache.
> > > 
> > > Currently md-cache doesn't have any cache-management, we will be
> > > targeting this
> > > for 3.9
> > > 
> > > > > 6. Network disconnects and invalidating cache. When a network
> > > > > disconnect
> > > > > happens we need to invalidate cache for inodes present on that brick
> > > > > as we
> > > > > might be missing some notifications. Current approach of purging
> > > > > cache of
> > > > > all inodes might not be optimal as it might rollback benefits of
> > > > > caching.
> > > > > Also, please note that network disconnects are not rare events.
> > > 
> > > Network disconnects are handled to a minimal extent, where any brick down
> > > will
> > > cause the whole of the cache to be invalidated. Invalidating only the
> > > list of
> > > inodes that belong to that perticular brick will need the support from
> > > the
> > > underlying cluster xlators.
> > > 
> > > > > 
> > > > > regards,
> > > > > Raghavendra
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel