Hi,
On 03.11.18 10:31, jesper@xxxxxxxx wrote:
I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.
In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.
It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.
CephFS is a distributed system, so there's a bookkeeping about every
file in use by any CephFS client. These entities are 'capabilities';
they also implement stuff like distributed locking.
The MDS has to cache every capability it has assigned to a CephFS
client, in addition to the cache for inode information and other stuff.
The cache size is limited to control the memory consumption of the MDS
process. If a MDS is running out of cache, it tries to revoke
capabilities assigned to CephFS clients to free some memory for new
capabilities. This revoke process runs asynchronous from MDS to CephFS
client, similar to NFS delegation.
If a CephFS client receive a cap release request and it is able to
perform it (no processes accessing the file at the moment), the client
cleaned up its internal state and allows the MDS to release the cap.
This cleanup also involves removing file data from the page cache.
If your MDS was running with a too small cache size, it had to revoke
caps over and over to adhere to its cache size, and the clients had to
cleanup their cache over and over, too.
You did not mention any details about the MDS settings, especially the
cache size. I assume you increased the cache size after adding more
memory, since the problem seems to be solved now.
It actually is not solved, but only mitigated. If your working set size
increases or the number of clients increases, the MDS has to manage more
caps and will have to revoke caps more often. You will probably reach an
equilibrium at some point. The MDS is the most memory hungry part of
Ceph, and it often caught people by surprise. We had the same problem in
our setup; even worse the nightly backup is also trashing the MDS cache.
The best way to monitor the MDS is using the 'ceph daemonperf mds.XYZ'
command on the MDS host. It gives you the current performance counters
including the inode and caps count. Our MDS is configured with a 40 GB
cache size and currently has 15 million inodes cached and is managing
3.1 million capabilities.
TL;DR: MDS needs huge amounts of memory for its internal bookkeeping.
Hope this helps.
Regards,
Burkhard
If you can reproduce this issue. please send kernel log to us.
Will do if/when it reappears.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com