Re: cephfs mds millions of caps

"Yan, Zheng" <ukernel@xxxxxxxxx> · Fri, 15 Dec 2017 09:36:24 +0800

On Fri, Dec 15, 2017 at 1:18 AM, Webert de Souza Lima
<webert.boss@xxxxxxxxx> wrote:
> Hi,
>
> I've been look at ceph mds perf counters and I saw the one of my clusters
> was hugely different from other in number of caps:
>
> rlat inos  caps  | hsr  hcs   hcr | writ read actv  | recd recy stry  purg |
> segs evts subm
>   0  3.0M 5.1M |  0     0     595 | 304    4    0     |  0       0   13k   0
> | 42     35k   893
>   0  3.0M 5.1M |  0     0     165 | 1.8k   4    37   |  0       0   13k   0
> | 43     36k   302
> 16  3.0M 5.1M |  0     0     429 | 247    9    4     |  0       0   13k   58
> | 38     32k   1.7k
>   0  3.0M 5.1M |  0     1     213 | 1.2k   0    857 |  0       0   13k   0
> | 40     33k   766
> 23  3.0M 5.1M |  0     0     945 | 445    1    0     |  0       0   13k   0
> | 41     34k   1.1k
>   0  3.0M 5.1M |  0     2     696 | 376   11   0     |  0       0   13k   0
> | 43     35k   1.0k
>   3  2.9M 5.1M |  0     0     601 | 2.0k   6    0     |  0       0   13k
> 56    | 38     29k   1.2k
>   0  2.9M 5.1M |  0     0     394 | 272   11   0     |  0       0   13k   0
> | 38     30k   758
>
> on another cluster running the same version:
>
> -----mds------ --mds_server-- ---objecter--- -----mds_cache-----
> ---mds_log----
> rlat inos caps  | hsr  hcs  hcr  | writ read actv | recd recy stry purg |
> segs evts subm
>   2  3.9M 380k |  0    1     266 | 1.8k   0   370  |  0       0   24k  44
> |  37  129k  1.5k
>
>
> I did a perf dump on the active mds:
>
> ~# ceph daemon mds.a perf dump mds
> {
>     "mds": {
>         "request": 2245276724,
>         "reply": 2245276366,
>         "reply_latency": {
>             "avgcount": 2245276366,
>             "sum": 18750003.074118977
>         },
>         "forward": 0,
>         "dir_fetch": 20217943,
>         "dir_commit": 555295668,
>         "dir_split": 0,
>         "inode_max": 3000000,
>         "inodes": 3000276,
>         "inodes_top": 152555,
>         "inodes_bottom": 279938,
>         "inodes_pin_tail": 2567783,
>         "inodes_pinned": 2782064,
>         "inodes_expired": 308697104,
>         "inodes_with_caps": 2779658,
>         "caps": 5147887,
>         "subtrees": 2,
>         "traverse": 2582452087,
>         "traverse_hit": 2338123987,
>         "traverse_forward": 0,
>         "traverse_discover": 0,
>         "traverse_dir_fetch": 16627249,
>         "traverse_remote_ino": 29276,
>         "traverse_lock": 2507504,
>         "load_cent": 18446743868740589422,
>         "q": 27,
>         "exported": 0,
>         "exported_inodes": 0,
>         "imported": 0,
>         "imported_inodes": 0
>     }
> }
>
> and then a session ls to see what clients could be holding that much:
>
>    {
>       "client_metadata" : {
>          "entity_id" : "admin",
>          "kernel_version" : "4.4.0-97-generic",
>          "hostname" : "suppressed"
>       },
>       "completed_requests" : 0,
>       "id" : 1165169,
>       "num_leases" : 343,
>       "inst" : "client.1165169 10.0.0.112:0/982172363",
>       "state" : "open",
>       "num_caps" : 111740,
>       "reconnecting" : false,
>       "replay_requests" : 0
>    },
>    {
>       "state" : "open",
>       "replay_requests" : 0,
>       "reconnecting" : false,
>       "num_caps" : 108125,
>       "id" : 1236036,
>       "completed_requests" : 0,
>       "client_metadata" : {
>          "hostname" : "suppressed",
>          "kernel_version" : "4.4.0-97-generic",
>          "entity_id" : "admin"
>       },
>       "num_leases" : 323,
>       "inst" : "client.1236036 10.0.0.113:0/1891451616"
>    },
>    {
>       "num_caps" : 63186,
>       "reconnecting" : false,
>       "replay_requests" : 0,
>       "state" : "open",
>       "num_leases" : 147,
>       "completed_requests" : 0,
>       "client_metadata" : {
>          "kernel_version" : "4.4.0-75-generic",
>          "entity_id" : "admin",
>          "hostname" : "suppressed"
>       },
>       "id" : 1235930,
>       "inst" : "client.1235930 10.0.0.110:0/2634585537"
>    },
>    {
>       "num_caps" : 2476444,
>       "replay_requests" : 0,
>       "reconnecting" : false,
>       "state" : "open",
>       "num_leases" : 0,
>       "completed_requests" : 0,
>       "client_metadata" : {
>          "entity_id" : "admin",
>          "kernel_version" : "4.4.0-75-generic",
>          "hostname" : "suppressed"
>       },
>       "id" : 1659696,
>       "inst" : "client.1659696 10.0.0.101:0/4005556527"
>    },
>    {
>       "state" : "open",
>       "replay_requests" : 0,
>       "reconnecting" : false,
>       "num_caps" : 2386376,
>       "id" : 1069714,
>       "client_metadata" : {
>          "hostname" : "suppressed",
>          "kernel_version" : "4.4.0-75-generic",
>          "entity_id" : "admin"
>       },
>       "completed_requests" : 0,
>       "num_leases" : 0,
>       "inst" : "client.1069714 10.0.0.111:0/1876172355"
>    },
>    {
>       "replay_requests" : 0,
>       "reconnecting" : false,
>       "num_caps" : 1726,
>       "state" : "open",
>       "inst" : "client.8394 10.0.0.103:0/3970353996",
>       "num_leases" : 0,
>       "id" : 8394,
>       "client_metadata" : {
>          "entity_id" : "admin",
>          "kernel_version" : "4.4.0-75-generic",
>          "hostname" : "suppressed"
>       },
>       "completed_requests" : 0
>    }
>
>
> Surprisingly, the 2 hosts that were holding 2M+ caps were the ones not in
> use. Cephfs was mounted but nothing was using the dirs.
> I did mount -o remount cephfs on those 2 hosts and, after that, caps dropped
> significantly to less than 300k.
>
>  "caps": 288489
>
>
> So, questions: does that really matter? What are possible impacts? What
> could have caused this 2 hosts to hold so many capabilities?
> 1 of the hosts are for tests purposes, traffic is close to zero. The other
> host wasn't using cephfs at all. All services stopped.
>

The client hold so many capabilities because kernel keeps lots of
inodes in its cache. Kernel does not trim inodes by itself if it has
no memory pressure. It seems you have set mds_cache_size config to a
large value.  mds cache size isn't large enough, so mds does not ask
the client to trim its inode cache neither.  This can affect
performance. we should make mds recognize idle client and ask idle
client to trim its caps more aggressively

Regards
Yan, Zheng

>
> :~# ceph -v
> ceph version 10.2.9-4-gbeaec39 (beaec397f00491079cd74f7b9e3e10660859e26b)
>
> ~# uname  -a
> Linux hostname_suppressed 4.4.0-75-generic #96~14.04.1-Ubuntu SMP Thu Apr 20
> 11:06:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> ~# dpkg -l | grep ceph
> ii  ceph                                 10.2.9-4-gbeaec39-1trusty
> amd64        distributed storage and file system
> ii  ceph-base                            10.2.9-4-gbeaec39-1trusty
> amd64        common ceph daemon libraries and management tools
> ii  ceph-common                          10.2.9-4-gbeaec39-1trusty
> amd64        common utilities to mount and interact with a ceph storage
> cluster
> ii  ceph-fs-common                       10.2.9-4-gbeaec39-1trusty
> amd64        common utilities to mount and interact with a ceph file system
> ii  ceph-mds                             10.2.9-4-gbeaec39-1trusty
> amd64        metadata server for the ceph distributed file system
> ii  ceph-mon                             10.2.9-4-gbeaec39-1trusty
> amd64        monitor server for the ceph storage system
> ii  ceph-osd                             10.2.9-4-gbeaec39-1trusty
> amd64        OSD server for the ceph storage system
> ii  libcephfs1                           10.2.9-4-gbeaec39-1trusty
> amd64        Ceph distributed file system client library
> ii  python-cephfs                        10.2.9-4-gbeaec39-1trusty
> amd64        Python libraries for the Ceph libcephfs library
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com