Re: Luminous : 3 clients failing to respond to cache pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, Oct 17, 2017 at 6:36 AM Yoann Moulin <yoann.moulin@xxxxxxx> wrote:
Hello,

I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) and we hit the "X clients failing to respond to cache pressure" message.
I have 3 mds servers active.

Is this something I have to worry about ?

This message means
* the MDS has exceeded the size of its cache, and 
* the MSD has asked clients to reduce the number of files they hold capabilities on (so the MDS can trim them out of cache), and
* the clients are not returning capabilities

It's entirely possible this is because the clients are actually holding references to all those files. If you haven't configured your cache size explicitly, you can probably increase it by a lot, and perhaps put this warning to bed.
-Greg
 

here some information about the cluster :

> root@iccluster054:~# ceph --cluster container -s
>   cluster:
>     id:     a294a95a-0baa-4641-81c1-7cd70fd93216
>     health: HEALTH_WARN
>             3 clients failing to respond to cache pressure
>
>   services:
>     mon: 3 daemons, quorum iccluster041.iccluster.epfl.ch,iccluster042.iccluster.epfl.ch,iccluster054.iccluster.epfl.ch
>     mgr: iccluster042(active), standbys: iccluster054
>     mds: cephfs-3/3/3 up  {0=iccluster054.iccluster.epfl.ch=up:active,1=iccluster041.iccluster.epfl.ch=up:active,2=iccluster042.iccluster.epfl.ch=up:active}
>     osd: 18 osds: 18 up, 18 in
>
>   data:
>     pools:   3 pools, 544 pgs
>     objects: 2357k objects, 564 GB
>     usage:   2011 GB used, 65055 GB / 67066 GB avail
>     pgs:     544 active+clean
>



> root@iccluster041:~# ceph --cluster container daemon mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
>     "mds": {
>         "request": 193508283,
>         "reply": 192815355,
>         "reply_latency": {
>             "avgcount": 192815355,
>             "sum": 457371.475011160,
>             "avgtime": 0.002372069
>         },
>         "forward": 692928,
>         "dir_fetch": 1717132,
>         "dir_commit": 43521,
>         "dir_split": 4197,
>         "dir_merge": 4244,
>         "inode_max": 2147483647,
>         "inodes": 11098,
>         "inodes_top": 7668,
>         "inodes_bottom": 3404,
>         "inodes_pin_tail": 26,
>         "inodes_pinned": 143,
>         "inodes_expired": 1386234444,
>         "inodes_with_caps": 87,
>         "caps": 239,
>         "subtrees": 15,
>         "traverse": 195425369,
>         "traverse_hit": 192867085,
>         "traverse_forward": 692723,
>         "traverse_discover": 476,
>         "traverse_dir_fetch": 1714684,
>         "traverse_remote_ino": 0,
>         "traverse_lock": 6,
>         "load_cent": 19465322425,
>         "q": 0,
>         "exported": 1211,
>         "exported_inodes": 845556,
>         "imported": 1082,
>         "imported_inodes": 1209280
>     }
> }


> root@iccluster041:~# ceph --cluster container daemon mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
>     "mds": {
>         "request": 193508283,
>         "reply": 192815355,
>         "reply_latency": {
>             "avgcount": 192815355,
>             "sum": 457371.475011160,
>             "avgtime": 0.002372069
>         },
>         "forward": 692928,
>         "dir_fetch": 1717132,
>         "dir_commit": 43521,
>         "dir_split": 4197,
>         "dir_merge": 4244,
>         "inode_max": 2147483647,
>         "inodes": 11098,
>         "inodes_top": 7668,
>         "inodes_bottom": 3404,
>         "inodes_pin_tail": 26,
>         "inodes_pinned": 143,
>         "inodes_expired": 1386234444,
>         "inodes_with_caps": 87,
>         "caps": 239,
>         "subtrees": 15,
>         "traverse": 195425369,
>         "traverse_hit": 192867085,
>         "traverse_forward": 692723,
>         "traverse_discover": 476,
>         "traverse_dir_fetch": 1714684,
>         "traverse_remote_ino": 0,
>         "traverse_lock": 6,
>         "load_cent": 19465322425,
>         "q": 0,
>         "exported": 1211,
>         "exported_inodes": 845556,
>         "imported": 1082,
>         "imported_inodes": 1209280
>     }
> }

> root@iccluster054:~# ceph --cluster container daemon mds.iccluster054.iccluster.epfl.ch perf dump mds
> {
>     "mds": {
>         "request": 267620366,
>         "reply": 255792944,
>         "reply_latency": {
>             "avgcount": 255792944,
>             "sum": 42256.407340600,
>             "avgtime": 0.000165197
>         },
>         "forward": 11827411,
>         "dir_fetch": 183,
>         "dir_commit": 2607,
>         "dir_split": 27,
>         "dir_merge": 19,
>         "inode_max": 2147483647,
>         "inodes": 3740,
>         "inodes_top": 2517,
>         "inodes_bottom": 1149,
>         "inodes_pin_tail": 74,
>         "inodes_pinned": 143,
>         "inodes_expired": 2103018,
>         "inodes_with_caps": 57,
>         "caps": 272,
>         "subtrees": 8,
>         "traverse": 267626346,
>         "traverse_hit": 255796915,
>         "traverse_forward": 11826902,
>         "traverse_discover": 77,
>         "traverse_dir_fetch": 30,
>         "traverse_remote_ino": 0,
>         "traverse_lock": 0,
>         "load_cent": 26824996745,
>         "q": 3,
>         "exported": 1319,
>         "exported_inodes": 2037400,
>         "imported": 418,
>         "imported_inodes": 7347
>     }
> }

--
Yoann Moulin
EPFL IC-IT
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux