On Tue, Oct 17, 2017 at 6:36 AM Yoann Moulin <yoann.moulin@xxxxxxx> wrote:
Hello,
I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) and we hit the "X clients failing to respond to cache pressure" message.
I have 3 mds servers active.
Is this something I have to worry about ?
This message means
* the MDS has exceeded the size of its cache, and
* the MSD has asked clients to reduce the number of files they hold capabilities on (so the MDS can trim them out of cache), and
* the clients are not returning capabilities
It's entirely possible this is because the clients are actually holding references to all those files. If you haven't configured your cache size explicitly, you can probably increase it by a lot, and perhaps put this warning to bed.
-Greg
here some information about the cluster :
> root@iccluster054:~# ceph --cluster container -s
> cluster:
> id: a294a95a-0baa-4641-81c1-7cd70fd93216
> health: HEALTH_WARN
> 3 clients failing to respond to cache pressure
>
> services:
> mon: 3 daemons, quorum iccluster041.iccluster.epfl.ch,iccluster042.iccluster.epfl.ch,iccluster054.iccluster.epfl.ch
> mgr: iccluster042(active), standbys: iccluster054
> mds: cephfs-3/3/3 up {0=iccluster054.iccluster.epfl.ch=up:active,1=iccluster041.iccluster.epfl.ch=up:active,2=iccluster042.iccluster.epfl.ch=up:active}
> osd: 18 osds: 18 up, 18 in
>
> data:
> pools: 3 pools, 544 pgs
> objects: 2357k objects, 564 GB
> usage: 2011 GB used, 65055 GB / 67066 GB avail
> pgs: 544 active+clean
>
> root@iccluster041:~# ceph --cluster container daemon mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 193508283,
> "reply": 192815355,
> "reply_latency": {
> "avgcount": 192815355,
> "sum": 457371.475011160,
> "avgtime": 0.002372069
> },
> "forward": 692928,
> "dir_fetch": 1717132,
> "dir_commit": 43521,
> "dir_split": 4197,
> "dir_merge": 4244,
> "inode_max": 2147483647,
> "inodes": 11098,
> "inodes_top": 7668,
> "inodes_bottom": 3404,
> "inodes_pin_tail": 26,
> "inodes_pinned": 143,
> "inodes_expired": 1386234444,
> "inodes_with_caps": 87,
> "caps": 239,
> "subtrees": 15,
> "traverse": 195425369,
> "traverse_hit": 192867085,
> "traverse_forward": 692723,
> "traverse_discover": 476,
> "traverse_dir_fetch": 1714684,
> "traverse_remote_ino": 0,
> "traverse_lock": 6,
> "load_cent": 19465322425,
> "q": 0,
> "exported": 1211,
> "exported_inodes": 845556,
> "imported": 1082,
> "imported_inodes": 1209280
> }
> }
> root@iccluster041:~# ceph --cluster container daemon mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 193508283,
> "reply": 192815355,
> "reply_latency": {
> "avgcount": 192815355,
> "sum": 457371.475011160,
> "avgtime": 0.002372069
> },
> "forward": 692928,
> "dir_fetch": 1717132,
> "dir_commit": 43521,
> "dir_split": 4197,
> "dir_merge": 4244,
> "inode_max": 2147483647,
> "inodes": 11098,
> "inodes_top": 7668,
> "inodes_bottom": 3404,
> "inodes_pin_tail": 26,
> "inodes_pinned": 143,
> "inodes_expired": 1386234444,
> "inodes_with_caps": 87,
> "caps": 239,
> "subtrees": 15,
> "traverse": 195425369,
> "traverse_hit": 192867085,
> "traverse_forward": 692723,
> "traverse_discover": 476,
> "traverse_dir_fetch": 1714684,
> "traverse_remote_ino": 0,
> "traverse_lock": 6,
> "load_cent": 19465322425,
> "q": 0,
> "exported": 1211,
> "exported_inodes": 845556,
> "imported": 1082,
> "imported_inodes": 1209280
> }
> }
> root@iccluster054:~# ceph --cluster container daemon mds.iccluster054.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 267620366,
> "reply": 255792944,
> "reply_latency": {
> "avgcount": 255792944,
> "sum": 42256.407340600,
> "avgtime": 0.000165197
> },
> "forward": 11827411,
> "dir_fetch": 183,
> "dir_commit": 2607,
> "dir_split": 27,
> "dir_merge": 19,
> "inode_max": 2147483647,
> "inodes": 3740,
> "inodes_top": 2517,
> "inodes_bottom": 1149,
> "inodes_pin_tail": 74,
> "inodes_pinned": 143,
> "inodes_expired": 2103018,
> "inodes_with_caps": 57,
> "caps": 272,
> "subtrees": 8,
> "traverse": 267626346,
> "traverse_hit": 255796915,
> "traverse_forward": 11826902,
> "traverse_discover": 77,
> "traverse_dir_fetch": 30,
> "traverse_remote_ino": 0,
> "traverse_lock": 0,
> "load_cent": 26824996745,
> "q": 3,
> "exported": 1319,
> "exported_inodes": 2037400,
> "imported": 418,
> "imported_inodes": 7347
> }
> }
--
Yoann Moulin
EPFL IC-IT
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com