Re: CephFS and page cache

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Mon, 19 Oct 2015 04:34:25 -0400 (EDT)

What kind of applications are you talking about regarding to applications
for HPC.

Are you talking about like netcdf?

Caching is quite necessary for some applications for computation.
But it's not always the case.

It's not quite related to this topic but I'm really interested in your
thought using Ceph cluster for HPC computation.

Shinobu 

----- Original Message -----
From: "Burkhard Linke" <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Sent: Monday, October 19, 2015 4:59:21 PM
Subject: Re:  CephFS and page cache

Hi,

On 10/19/2015 05:27 AM, Yan, Zheng wrote:
> On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke
> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> I've noticed that CephFS (both ceph-fuse and kernel client in version 4.2.3)
>> remove files from page cache as soon as they are not in use by a process
>> anymore.
>>
>> Is this intended behaviour? We use CephFS as a replacement for NFS in our
>> HPC cluster. It should serve large files which are read by multiple jobs on
>> multiple hosts, so keeping them in the page cache over the duration of
>> several job invocations is crucial.
> Yes. MDS needs resource to track the cached data. We don't want MDS
> use too much resource.
>
>> Mount options are defaults,noatime,_netdev (+ extra options for the kernel
>> client). Is there an option to keep data in page cache just like any other
>> filesystem?
> So far there is no option to do that. Later, we may add an option to
> keep the cached data for a few seconds.

This renders CephFS useless for almost any HPC cluster application. And 
keeping data for a few seconds is not a solution in most cases.

CephFS supports capabilities to manages access to objects, enforce 
consistency of data etc. IMHO a sane way to handle the page cache is use 
a capability to inform the mds about caches objects; as long as no other 
client claims write access to an object or its metadata, the cache copy 
is considered consistent. Upon write access the client should drop the 
capability (and thus remove the object from the page cache). If another 
process tries to access a cache object with intact 'cache' capability, 
it may be promoted to read/write capability.

I haven't dug into the details of either capabilities or kernel page 
cache, but the method described above should be very similar to the 
existing read only capability. I don't know whether there's a kind of 
eviction callback in the page cache that cephfs can use to update 
capabilities if an object is removed from the page cache (e.g. due to 
memory pressure), but I'm pretty sure that other filesystems like NFS 
also need to keep track of what's cached.

This approach will probably increase the resources for both MDS and 
cephfs clients, but the benefits are obvious. For use cases with limited 
resource the MDS may refuse the 'cache' capability to client to reduce 
the memory footprint.

Just my 2 ct and regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com