RE: [Linux-cachefs] Re: NFS Patch for FSCache

"Lever, Charles" <Charles.Lever@xxxxxxxxxx> · Thu, 12 May 2005 15:43:18 -0700

preface:  i think this is interesting important work.

> Steve Dickson <SteveD@xxxxxxxxxx> wrote:
> 
> > But the real saving, imho, is the fact those reads were 
> measured after the
> > filesystem was umount then remounted. So system wise, there 
> should be some
> > gain due to the fact that NFS is not using the network....

i expect to see those gains when either the network and server are
slower than the client's local disk, or when the cached files are
significantly larger than the client's local RAM.  these conditions will
not always be the case, so i'm interested to know how performance is
affected when the system is running outside this sweet spot.

> I tested md5sum read speed also. My testbox is a dual 200MHz 
> PPro. It's got
> 128MB of RAM. I've got a 100MB file on the NFS server for it to read.
> 
> 	No Cache:	~14s
> 	Cold Cache:	~15s
> 	Warm Cache:	~2s
> 
> Now these numbers are approximate because they're from memory.

to benchmark this i think you need to explore the architectural
weaknesses of your approach.  how bad will it get using cachefs with
badly designed applications or client/server setups?

for instance, what happens when the client's cache disk is much slower
than the server (high performance RAID with high speed networking)?
what happens when the client's cache disk fills up so the disk cache is
constantly turning over (which files are kicked out of your backing
cachefs to make room for new data)?  what happens with multi-threaded
I/O-bound applications when the cachefs is on a single spindle?  is
there any performance dependency on the size of the backing cachefs?

do you also cache directory contents on disk?

remember that the application you designed this for (preserving cache
contents across client reboots) is only one way this will be used.  some
of us would like to use this facility to provide a high-performance
local cache larger than the client's RAM.  :^)

> Note that a cold cache is worse than no cache because CacheFS 
> (a) has to check
> the disk before NFS goes to the server, and (b) has to 
> journal the allocations
> of new data blocks. It may also have to wait whilst pages are 
> written to disk
> before it can get new ones rather than just dropping them 
> (100MB is big enough
> wrt 128MB that this will happen) and 100MB is sufficient to 
> cause it to start
> using single- and double-indirection pointers to find its 
> blocks on disk,
> though these are cached in the page cache.

synchronous file system metadata management is the bane of every cachefs
implementation i know about.  have you measured what performance impact
there is when cache files go from no indirection to single indirect
blocks, or from single to double indirection?  have you measured how
expensive it is to reuse a single cache file because the cachefs file
system is already full?  how expensive is it to invalidate the data in
the cache (say, if some other client changes a file you already have
cached in your cachefs)?

what about using an extent-based file system for the backing cachefs?
that would probably not be too difficult because you have a good
prediction already of how large the file will be (just look at the file
size on the server).

how about using smallish chunks, like the AFS cache manager, to avoid
indirection entirely?  would there be any performance advantage to
caching small files in memory and large files on disk, or vice versa?