Hi there, This thread contains some really insightful information. Thanks Eugen for sharing the explanation by the SUSE team. Definitely the doc can be updated with this, it might help a lot of people indeed. Can you help creating a tracker for this? I wish to add the info to doc and push a PR for the same. On Wed, Aug 10, 2022 at 1:45 AM Malte Stroem <malte.stroem@xxxxxxxxx> wrote: > Hello Eugen, > > thank you very much for the full explanation. > > This fixed our cluster and I am sure this helps a lot of people around > the world since this is a problem occuring everywhere. > > I think this should be added to the documentation: > > https://docs.ceph.com/en/latest/cephfs/cache-configuration/#mds-recall > > or better: > > > https://docs.ceph.com/en/quincy/cephfs/health-messages/#mds-client-recall-mds-health-client-recall-many > > Best wishes! > Malte > > Am 09.08.22 um 16:34 schrieb Eugen Block: > > Hi, > > > >> did you have some success with modifying the mentioned values? > > > > yes, the SUSE team helped identifying the issue, I can share the > > explanation: > > > > ---snip--- > > Every second (mds_cache_trim_interval config param) the mds is running > > "cache trim" procedure. One of the steps of this procedure is "recall > > client state". During this step it checks every client (session) if it > > needs to recall caps. There are several criteria for this: > > > > 1) the cache is full (exceeds mds_cache_memory_limit) and needs some > > inodes to be released; > > 2) the client exceeds mds_max_caps_per_client (1M by default); > > 3) the client is inactive. > > > > To determine a client (session) inactivity, the session's cache_liveness > > parameters is checked and compared with the value: > > > > (num_caps >> mds_session_cache_liveness_magnitude) > > > > where mds_session_cache_liveness_magnitude is a config param (10 by > > default). > > If cache_liveness is smaller than this calculated value the session is > > considered inactive and the mds sends "recall caps" request for all > > cached caps (actually the recall value is `num_caps - > > mds_min_caps_per_client(100)`). > > > > And if the client is not releasing the caps fast, the next second it > > repeats again, i.e. the mds will send "recall caps" with high value > > again and so on and the "total" counter of "recall caps" for the session > > will grow, eventually exceeding the mon warning limit. > > There is a throttling mechanism, controlled by > > mds_recall_max_decay_threshold parameter (126K by default), which should > > reduce the rate of "recall caps" counter grow but it looks like it is > > not enough for this case. > > > > From the collected sessions, I see that during that 30 minute period > > the total num_caps for that client decreased by about 3500. > > ... > > Here is an example. A client is having 20k caps cached. At some moment > > the server decides the client is inactive (because the session's > > cache_liveness value is low). It starts to ask the client to release > > caps down to mds_min_caps_per_client value (100 by default). For this > > every seconds it sends recall_caps asking to release `caps_num - > > mds_min_caps_per_client` caps (but not more than `mds_recall_max_caps`, > > which is 30k by default). A client is starting to release, but is > > releasing with a rate e.g. only 100 caps per second. > > > > So in the first second the mds sends recall_caps = 20k - 100 > > the second second recall_caps = (20k - 100) - 100 > > the third second recall_caps = (20k - 200) - 100 > > and so on > > > > And every time it sends recall_caps it updates the session's recall_caps > > value, which is calculated how many recall_caps sent in the last > > minute. I.e. the counter is growing quickly, eventually exceeding > > mds_recall_warning_threshold, which is 128K by default, and ceph starts > > to report "failing to respond to cache pressure" warning in the status. > > > > Now, after we set mds_recall_max_caps to 3K, in this situation the mds > > server sends only 3K recall_caps per second, and the maximum value the > > session's recall_caps value may have (if the mds is sending 3K every > > second for at least one minute) is 60 * 3K = 180K. I.e. it is still > > possible to achieve mds_recall_warning_threshold but only if a client is > > not "responding" for long period, and as your experiments show it is not > > the case. > > ---snip--- > > > > So what helped us here was to decrease mds_recall_max_caps in 1k steps, > > starting with 10000. This didn't reduce the warnings so I decreased it > > to 3000 and I haven't seen those warnings since then. Also I decreased > > the mds_cache_memory_limit again, it wasn't helping here. > > > > Regards, > > Eugen > > > > > > Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>: > > > >> Hello Eugen, > >> > >> did you have some success with modifying the mentioned values? > >> > >> Or some others from: > >> > >> https://docs.ceph.com/en/latest/cephfs/cache-configuration/ > >> > >> Best, > >> Malte > >> > >> Am 15.06.22 um 14:12 schrieb Eugen Block: > >>> Hi *, > >>> > >>> I finally caught some debug logs during the cache pressure warnings. > >>> In the meantime I had doubled the mds_cache_memory_limit to 128 GB > >>> which decreased the number cache pressure messages significantly, but > >>> they still appear a few times per day. > >>> > >>> Turning on debug logs for a few seconds results in a 1 GB file, but I > >>> found this message: > >>> > >>> 2022-06-15 10:07:34.254 7fdbbd44a700 2 mds.beacon.stmailmds01b-8 > >>> Session chead015:cephfs_client (2757628057) is not releasing caps > >>> fast enough. Recalled caps at 390118 > 262144 > >>> (mds_recall_warning_threshold). > >>> > >>> So now I know which limit is reached here, the question is what to do > >>> about it? Should I increase the mds_recall_warning_threshold (default > >>> 256k) or should I maybe increase mds_recall_max_caps (currently at > >>> 60k, default is 50k)? Any other suggestions? I'd appreciate any > >>> comments. > >>> > >>> Thanks, > >>> Eugen > >>> > >>> > >>> Zitat von Eugen Block <eblock@xxxxxx>: > >>> > >>>> Hi, > >>>> > >>>> I'm currently debugging a reoccuring issue with multi-active MDS. > >>>> The cluster is still on Nautilus and can't be upgraded at this time. > >>>> There have been many discussions about "cache pressure" and I was > >>>> able to find the right settings a couple of times, but before I > >>>> change too much in this setup I'd like to ask for your opinion. I'll > >>>> add some information at the end. > >>>> So we have 16 active MDS daemons spread over 2 servers for one > >>>> cephfs (8 daemons per server) with mds_cache_memory_limit = 64GB, > >>>> the MDS servers are mostly idle except for some short peaks. Each of > >>>> the MDS daemons uses around 2 GB according to 'ceph daemon mds.<MDS> > >>>> cache status', so we're nowhere near the 64GB limit. There are > >>>> currently 25 servers that mount the cephs as clients. > >>>> Watching the ceph health I can see that the reported clients with > >>>> cache pressure change, so they are not actually stuck but just don't > >>>> respond as quickly as the MDS would like them to (I assume). For > >>>> some of the mentioned clients I see high values for > >>>> .recall_caps.value in the 'daemon session ls' output (at the bottom). > >>>> > >>>> The docs basically state this: > >>>>> When the MDS needs to shrink its cache (to stay within > >>>>> mds_cache_size), it sends messages to clients to shrink their > >>>>> caches too. The client is unresponsive to MDS requests to release > >>>>> cached inodes. Either the client is unresponsive or has a bug > >>>> > >>>> To me it doesn't seem like the MDS servers are near the cache size > >>>> limit, so it has to be the clients, right? In a different setup it > >>>> helped to decrease the client_oc_size from 200MB to 100MB, but then > >>>> there's also client_cache_size with 16K default. I'm not sure what > >>>> the best approach would be here. I'd appreciate any comments on how > >>>> to size the various cache/caps/threshold configurations. > >>>> > >>>> Thanks! > >>>> Eugen > >>>> > >>>> > >>>> ---snip--- > >>>> # ceph daemon mds.<MDS> session ls > >>>> > >>>> "id": 2728101146, > >>>> "entity": { > >>>> "name": { > >>>> "type": "client", > >>>> "num": 2728101146 > >>>> }, > >>>> [...] > >>>> "nonce": 1105499797 > >>>> } > >>>> }, > >>>> "state": "open", > >>>> "num_leases": 0, > >>>> "num_caps": 16158, > >>>> "request_load_avg": 0, > >>>> "uptime": 1118066.210318422, > >>>> "requests_in_flight": 0, > >>>> "completed_requests": [], > >>>> "reconnecting": false, > >>>> "recall_caps": { > >>>> "value": 788916.8276369586, > >>>> "halflife": 60 > >>>> }, > >>>> "release_caps": { > >>>> "value": 8.814981576458962, > >>>> "halflife": 60 > >>>> }, > >>>> "recall_caps_throttle": { > >>>> "value": 27379.27162576508, > >>>> "halflife": 1.5 > >>>> }, > >>>> "recall_caps_throttle2o": { > >>>> "value": 5382.261925615086, > >>>> "halflife": 0.5 > >>>> }, > >>>> "session_cache_liveness": { > >>>> "value": 12.91841737465921, > >>>> "halflife": 300 > >>>> }, > >>>> "cap_acquisition": { > >>>> "value": 0, > >>>> "halflife": 10 > >>>> }, > >>>> [...] > >>>> "used_inos": [], > >>>> "client_metadata": { > >>>> "features": "0x0000000000003bff", > >>>> "entity_id": "cephfs_client", > >>>> > >>>> > >>>> # ceph fs status > >>>> > >>>> cephfs - 25 clients > >>>> ====== > >>>> +------+--------+----------------+---------------+-------+-------+ > >>>> | Rank | State | MDS | Activity | dns | inos | > >>>> +------+--------+----------------+---------------+-------+-------+ > >>>> | 0 | active | stmailmds01d-3 | Reqs: 89 /s | 375k | 371k | > >>>> | 1 | active | stmailmds01d-4 | Reqs: 64 /s | 386k | 383k | > >>>> | 2 | active | stmailmds01a-3 | Reqs: 9 /s | 403k | 399k | > >>>> | 3 | active | stmailmds01a-8 | Reqs: 23 /s | 393k | 390k | > >>>> | 4 | active | stmailmds01a-2 | Reqs: 36 /s | 391k | 387k | > >>>> | 5 | active | stmailmds01a-4 | Reqs: 57 /s | 394k | 390k | > >>>> | 6 | active | stmailmds01a-6 | Reqs: 50 /s | 395k | 391k | > >>>> | 7 | active | stmailmds01d-5 | Reqs: 37 /s | 384k | 380k | > >>>> | 8 | active | stmailmds01a-5 | Reqs: 39 /s | 397k | 394k | > >>>> | 9 | active | stmailmds01a | Reqs: 23 /s | 400k | 396k | > >>>> | 10 | active | stmailmds01d-8 | Reqs: 74 /s | 402k | 399k | > >>>> | 11 | active | stmailmds01d-6 | Reqs: 37 /s | 399k | 395k | > >>>> | 12 | active | stmailmds01d | Reqs: 36 /s | 394k | 390k | > >>>> | 13 | active | stmailmds01d-7 | Reqs: 80 /s | 397k | 393k | > >>>> | 14 | active | stmailmds01d-2 | Reqs: 56 /s | 414k | 410k | > >>>> | 15 | active | stmailmds01a-7 | Reqs: 25 /s | 390k | 387k | > >>>> +------+--------+----------------+---------------+-------+-------+ > >>>> +-----------------+----------+-------+-------+ > >>>> | Pool | type | used | avail | > >>>> +-----------------+----------+-------+-------+ > >>>> | cephfs_metadata | metadata | 25.4G | 16.1T | > >>>> | cephfs_data | data | 2078G | 16.1T | > >>>> +-----------------+----------+-------+-------+ > >>>> +----------------+ > >>>> | Standby MDS | > >>>> +----------------+ > >>>> | stmailmds01b-5 | > >>>> | stmailmds01b-2 | > >>>> | stmailmds01b-3 | > >>>> | stmailmds01b | > >>>> | stmailmds01b-7 | > >>>> | stmailmds01b-8 | > >>>> | stmailmds01b-6 | > >>>> | stmailmds01b-4 | > >>>> +----------------+ > >>>> MDS version: ceph version 14.2.22-404-gf74e15c2e55 > >>>> (f74e15c2e552b3359f5a51482dfd8b049e262743) nautilus (stable) > >>>> ---snip--- > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- *Dhairya Parmar* He/Him/His Associate Software Engineer, CephFS Red Hat Inc. <https://www.redhat.com/> dparmar@xxxxxxxxxx <https://www.redhat.com/> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx