Re: glusterfs client and page cache

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 30 May 2012 19:10:58 -0400

On 05/30/2012 03:32 PM, Anand Avati wrote:
> Brian,
>   You are right, today we hardly leverage the page cache in the kernel.
> When Gluster started and performance translators were implemented, the
> fuse invalidation support did not exist, and since that support was
> brought in upstream fuse we haven't leveraged that effectively. We can
> actually do a lot more smart things using the invalidation changes.
> 
> For the consistency concerns where an open fd continues to refer to
> local page cache - if that is a problem, today you need to mount with
> --enable-direct-io-mode to bypass the page cache altogether (this is
> very different from O_DIRECT open() support). On the other hand, to
> utilize the fuse invalidation APIs and promote using the page cache and
> still be consistent, we need to gear up glusterfs framework by first
> implementing server originated messaging support, then build some kind
> of opportunistic locking or leases to notify glusterfs clients about
> modifications from a second client, and third implement hooks in the
> client side listener to do things like sending fuse invalidations or
> purge pages in io-cache or flush pending writes in write-behind etc.
> This needs to happen, but we're short on resources to prioritize this
> sooner :-)
> 

Thanks for the context Avati. The fuse patch I sent lead to a similar
thought process with regard to finer grained invalidation. So far it
seems well received, and as I understand it, we can also utilize that
mechanism to do full invalidations from gluster on older fuse modules
that wouldn't have that fix. I'll look into incorporating that into what
I have so far and making it available for review.

Brian

> Avati
> 
> On Wed, May 30, 2012 at 8:16 AM, Brian Foster <bfoster@xxxxxxxxxx
> <mailto:bfoster@xxxxxxxxxx>> wrote:
> 
>     Hi all,
> 
>     I've been playing with a little hack recently to add a gluster mount
>     option to support FOPEN_KEEP_CACHE and I wanted to solicit some thoughts
>     on whether there's value to find an intelligent way to support this
>     functionality. To provide some context:
> 
>     Our current behavior with regard to fuse is that page cache is utilized
>     by fuse, from what I can tell, just about in the same manner as a
>     typical local fs. The primary difference is that by default, the address
>     space mapping for an inode is completely invalidated on open. So for
>     example, if process A opens and reads a file in a loop, subsequent reads
>     are served from cache (bypassing fuse and gluster). If process B steps
>     in and opens the same file, the cache is flushed and the next reads from
>     either process are passed down through fuse. The FOPEN_KEEP_CACHE option
>     simply disables this cache flash on open behavior.
> 
>     The following are some notes on my experimentation thus far:
> 
>     - With FOPEN_KEEP_CACHE, fuse currently only invalidates on file size
>     changes. This is a problem in that I can rewrite some or all of a file
>     from another client and the cached client wouldn't notice. I've sent a
>     patch to fuse-devel to also invalidate on mtime changes (similar to
>     nfsv3 or cifs), so we'll see how well that is received. fuse also
>     supports a range based invalidation notification that we could take
>     advantage of if necessary.
> 
>     - I reproduce a measurable performance benefit in the local/cached read
>     situation. For example, running a kernel compile against a source tree
>     in a gluster volume (no other xlators and build output to local storage)
>     improves to 6 minutes from just under 8 minutes with the default graph
>     (9.5 minutes with only the client xlator and 1:09 locally).
> 
>     - Some of the specific differences from current io-cache caching:
>            - io-cache supports time based invalidation and tunables such
>       as cache
>     size and priority. The page cache has no such controls.
>            - io-cache invalidates more frequently on various fops. It
>     also looks
>     like we invalidate on writes and don't take advantage of the write data
>     most recently sent, whereas page cache writes are cached (errors
>     notwithstanding).
>            - Page cache obviously has tighter integration with the
>     system (i.e.,
>     drop_caches controls, more specific reporting, ability to drop cache
>     when memory is needed).
> 
>     All in all, I'm curious what people think about enabling the cache
>     behavior in gluster. We could support anything from the basic mount
>     option I'm currently using (i.e., similar to attribute/dentry caching)
>     to something integrated with io-cache (doing invalidations when
>     necessary), or maybe even something eventually along the lines of the
>     nfs weak cache consistency model where it validates the cache after
>     every fop based on file attributes.
> 
>     In general, are there other big issues/questions that would need to be
>     explored before this is useful (i.e., the size invalidation issue)? Are
>     there other performance tests that should be explored? Thoughts
>     appreciated. Thanks.
> 
>     Brian
> 
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
>     https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
>