Re: md-cache improvements

Vijay Bellur <vbellur@xxxxxxxxxx> · Mon, 15 Aug 2016 22:39:40 -0400

Hi Poornima, Dan -

Let us have a hangout/bluejeans session this week to discuss the planned 
md-cache improvements, proposed timelines and sort out open questions if 
any.

Would 11:00 UTC on Wednesday work for everyone in the To: list?

Thanks,
Vijay

On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote:

My comments inline.

Regards,
Poornima

----- Original Message -----
From: "Dan Lambright" <dlambrig@xxxxxxxxxx>
To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Wednesday, August 10, 2016 10:35:58 PM
Subject:  md-cache improvements

There have been recurring discussions within the gluster community to build
on existing support for md-cache and upcalls to help performance for small
file workloads. In certain cases, "lookup amplification" dominates data
transfers, i.e. the cumulative round trip times of multiple LOOKUPs from the
client mitigates benefits from faster backend storage.

To tackle this problem, one suggestion is to more aggressively utilize
md-cache to cache inodes on the client than is currently done. The inodes
would be cached until they are invalidated by the server.

Several gluster development engineers within the DHT, NFS, and Samba teams
have been involved with related efforts, which have been underway for some
time now. At this juncture, comments are requested from gluster developers.

(1) .. help call out where additional upcalls would be needed to invalidate
stale client cache entries (in particular, need feedback from DHT/AFR
areas),

(2) .. identify failure cases, when we cannot trust the contents of md-cache,
e.g. when an upcall may have been dropped by the network

Yes, this needs to be handled.
It can happen only when there is a one way disconnect, where the server cannot
reach client and notify fails. We can have a retry for the same until the cache
expiry time.

(3) .. point out additional improvements which md-cache needs. For example,
it cannot be allowed to grow unbounded.

This is being worked on, and will be targetted for 3.9

Dan

----- Original Message -----
From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>

List of areas where we need invalidation notification:
1. Any changes to xattrs used by xlators to store metadata (like dht layout
xattr, afr xattrs etc).

Currently, md-cache will negotiate(using ipc) with the brick, a list of xattrs
that it needs invalidation for. Other xlators can add the xattrs they are interested
in to the ipc. But then these xlators need to manage their own caching and processing
the invalidation request, as md-cache will be above all cluater xlators.
reference: http://review.gluster.org/#/c/15002/

2. Scenarios where individual xlator feels like it needs a lookup. For
example failed directory creation on non-hashed subvol in dht during mkdir.
Though dht succeeds mkdir, it would be better to not cache this inode as a
subsequent lookup will heal the directory and make things better.

For this, these xlators can specify an indicator in the dict of
the fop cbk, to not cache. This should be fairly simple to implement.

3. removing of files

When an unlink is issued from the mount point, the cache is invalidated.

4. writev on brick (to invalidate read cache on client)

writev on brick from any other client will invalidate the metadata cache on all
the other clients.

Other questions:
5. Does md-cache has cache management? like lru or an upper limit for
cache.

Currently md-cache doesn't have any cache-management, we will be targeting this
for 3.9

6. Network disconnects and invalidating cache. When a network disconnect
happens we need to invalidate cache for inodes present on that brick as we
might be missing some notifications. Current approach of purging cache of
all inodes might not be optimal as it might rollback benefits of caching.
Also, please note that network disconnects are not rare events.

Network disconnects are handled to a minimal extent, where any brick down will
cause the whole of the cache to be invalidated. Invalidating only the list of
inodes that belong to that perticular brick will need the support from the
underlying cluster xlators.

regards,
Raghavendra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel