Re: cephfs kernel client - page cache being invaildated.

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Sat, 3 Nov 2018 12:05:00 +0100

Hi,

On 03.11.18 10:31, jesper@xxxxxxxx wrote:
I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.
In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.

CephFS is a distributed system, so there's a bookkeeping about every 
file in use by any CephFS client. These entities are 'capabilities'; 
they also implement stuff like distributed locking.

The MDS has to cache every capability it has assigned to a CephFS 
client, in addition to the cache for inode information and other stuff. 
The cache size is limited to control the memory consumption of the MDS 
process. If a MDS is running out of cache, it tries to revoke 
capabilities assigned to CephFS clients to free some memory for new 
capabilities. This revoke process runs asynchronous from MDS to CephFS 
client, similar to NFS delegation.

If a CephFS client receive a cap release request and it is able to 
perform it (no processes accessing the file at the moment), the client 
cleaned up its internal state and allows the MDS to release the cap. 
This cleanup also involves removing file data from the page cache.

If your MDS was running with a too small cache size, it had to revoke 
caps over and over to adhere to its cache size, and the clients had to 
cleanup their cache over and over, too.

You did not mention any details about the MDS settings, especially the 
cache size. I assume you increased the cache size after adding more 
memory, since the problem seems to be solved now.

It actually is not solved, but only mitigated. If your working set size 
increases or the number of clients increases, the MDS has to manage more 
caps and will have to revoke caps more often. You will probably reach an 
equilibrium at some point. The MDS is the most memory hungry part of 
Ceph, and it often caught people by surprise. We had the same problem in 
our setup; even worse the nightly backup is also trashing the MDS cache.

The best way to monitor the MDS is using the 'ceph daemonperf mds.XYZ' 
command on the MDS host. It gives you the current performance counters 
including the inode and caps count. Our MDS is configured with a 40 GB 
cache size and currently has 15 million inodes cached and is managing 
3.1 million capabilities.

TL;DR: MDS needs huge amounts of memory for its internal bookkeeping.

Hope this helps.

Regards,

Burkhard

If you can reproduce this issue. please send kernel log to us.
Will do if/when it reappears.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com