Re: Multi-active MDS cache pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

This thread contains some really insightful information. Thanks Eugen for
sharing the explanation by the SUSE team. Definitely the doc can be updated
with this, it might help a lot of people indeed.
Can you help creating a tracker for this? I wish to add the info to doc and
push a PR for the same.

On Wed, Aug 10, 2022 at 1:45 AM Malte Stroem <malte.stroem@xxxxxxxxx> wrote:

> Hello Eugen,
>
> thank you very much for the full explanation.
>
> This fixed our cluster and I am sure this helps a lot of people around
> the world since this is a problem occuring everywhere.
>
> I think this should be added to the documentation:
>
> https://docs.ceph.com/en/latest/cephfs/cache-configuration/#mds-recall
>
> or better:
>
>
> https://docs.ceph.com/en/quincy/cephfs/health-messages/#mds-client-recall-mds-health-client-recall-many
>
> Best wishes!
> Malte
>
> Am 09.08.22 um 16:34 schrieb Eugen Block:
> > Hi,
> >
> >> did you have some success with modifying the mentioned values?
> >
> > yes, the SUSE team helped identifying the issue, I can share the
> > explanation:
> >
> > ---snip---
> > Every second (mds_cache_trim_interval config param) the mds is running
> > "cache trim" procedure. One of the steps of this procedure is "recall
> > client state". During this step it checks every client (session) if it
> > needs to recall caps. There are several criteria for this:
> >
> > 1) the cache is full (exceeds mds_cache_memory_limit) and needs some
> > inodes to be released;
> > 2) the client exceeds mds_max_caps_per_client (1M by default);
> > 3) the client is inactive.
> >
> > To determine a client (session) inactivity, the session's cache_liveness
> > parameters is checked and compared with the value:
> >
> >    (num_caps >> mds_session_cache_liveness_magnitude)
> >
> > where mds_session_cache_liveness_magnitude is a config param (10 by
> > default).
> > If cache_liveness is smaller than this calculated value the session is
> > considered inactive and the mds sends "recall caps" request for all
> > cached caps (actually the recall value is `num_caps -
> > mds_min_caps_per_client(100)`).
> >
> > And if the client is not releasing the caps fast, the next second it
> > repeats again, i.e. the mds will send "recall caps" with high value
> > again and so on and the "total" counter of "recall caps" for the session
> > will grow, eventually exceeding the mon warning limit.
> > There is a throttling mechanism, controlled by
> > mds_recall_max_decay_threshold parameter (126K by default), which should
> > reduce the rate of "recall caps" counter grow but it looks like it is
> > not enough for this case.
> >
> >  From the collected sessions, I see that during that 30 minute period
> > the total num_caps for that client decreased by about 3500.
> > ...
> > Here is an example. A client is having 20k caps cached. At some moment
> > the server decides the client is inactive (because the session's
> > cache_liveness value is low). It starts to ask the client to release
> > caps down to  mds_min_caps_per_client value (100 by default). For this
> > every seconds it sends recall_caps asking to release `caps_num -
> > mds_min_caps_per_client` caps (but not more than `mds_recall_max_caps`,
> > which is 30k by default). A client is starting to release, but is
> > releasing with a rate e.g. only 100 caps per second.
> >
> > So in the first second the mds sends recall_caps = 20k - 100
> > the second second recall_caps = (20k - 100) - 100
> > the third second recall_caps = (20k - 200) - 100
> > and so on
> >
> > And every time it sends recall_caps it updates the session's recall_caps
> > value, which is calculated  how many recall_caps sent in the last
> > minute. I.e. the counter is growing quickly, eventually exceeding
> > mds_recall_warning_threshold, which is 128K by default, and ceph starts
> > to report "failing to respond to cache pressure" warning in the status.
> >
> > Now, after we set mds_recall_max_caps to 3K, in this situation the mds
> > server sends only 3K recall_caps per second, and the maximum value the
> > session's recall_caps value may have (if the mds is sending 3K every
> > second for at least one minute) is 60 * 3K = 180K. I.e. it is still
> > possible to achieve mds_recall_warning_threshold but only if a client is
> > not "responding" for long period, and as your experiments show it is not
> > the case.
> > ---snip---
> >
> > So what helped us here was to decrease mds_recall_max_caps in 1k steps,
> > starting with 10000. This didn't reduce the warnings so I decreased it
> > to 3000 and I haven't seen those warnings since then. Also I decreased
> > the mds_cache_memory_limit again, it wasn't helping here.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:
> >
> >> Hello Eugen,
> >>
> >> did you have some success with modifying the mentioned values?
> >>
> >> Or some others from:
> >>
> >> https://docs.ceph.com/en/latest/cephfs/cache-configuration/
> >>
> >> Best,
> >> Malte
> >>
> >> Am 15.06.22 um 14:12 schrieb Eugen Block:
> >>> Hi *,
> >>>
> >>> I finally caught some debug logs during the cache pressure warnings.
> >>> In the meantime I had doubled the mds_cache_memory_limit to 128 GB
> >>> which decreased the number cache pressure messages significantly, but
> >>> they still appear a few times per day.
> >>>
> >>> Turning on debug logs for a few seconds results in a 1 GB file, but I
> >>> found this message:
> >>>
> >>> 2022-06-15 10:07:34.254 7fdbbd44a700  2 mds.beacon.stmailmds01b-8
> >>> Session chead015:cephfs_client (2757628057) is not releasing caps
> >>> fast enough. Recalled caps at 390118 > 262144
> >>> (mds_recall_warning_threshold).
> >>>
> >>> So now I know which limit is reached here, the question is what to do
> >>> about it? Should I increase the mds_recall_warning_threshold (default
> >>> 256k) or should I maybe increase mds_recall_max_caps (currently at
> >>> 60k, default is 50k)? Any other suggestions? I'd appreciate any
> >>> comments.
> >>>
> >>> Thanks,
> >>> Eugen
> >>>
> >>>
> >>> Zitat von Eugen Block <eblock@xxxxxx>:
> >>>
> >>>> Hi,
> >>>>
> >>>> I'm currently debugging a reoccuring issue with multi-active MDS.
> >>>> The cluster is still on Nautilus and can't be upgraded at this time.
> >>>> There have been many discussions about "cache pressure" and I was
> >>>> able to find the right settings a couple of times, but before I
> >>>> change too much in this setup I'd like to ask for your opinion. I'll
> >>>> add some information at the end.
> >>>> So we have 16 active MDS daemons spread over 2 servers for one
> >>>> cephfs (8 daemons per server) with mds_cache_memory_limit = 64GB,
> >>>> the MDS servers are mostly idle except for some short peaks. Each of
> >>>> the MDS daemons uses around 2 GB according to 'ceph daemon mds.<MDS>
> >>>> cache status', so we're nowhere near the 64GB limit. There are
> >>>> currently 25 servers that mount the cephs as clients.
> >>>> Watching the ceph health I can see that the reported clients with
> >>>> cache pressure change, so they are not actually stuck but just don't
> >>>> respond as quickly as the MDS would like them to (I assume). For
> >>>> some of the mentioned clients I see high values for
> >>>> .recall_caps.value in the 'daemon session ls' output (at the bottom).
> >>>>
> >>>> The docs basically state this:
> >>>>> When the MDS needs to shrink its cache (to stay within
> >>>>> mds_cache_size), it sends messages to clients to shrink their
> >>>>> caches too. The client is unresponsive to MDS requests to release
> >>>>> cached inodes. Either the client is unresponsive or has a bug
> >>>>
> >>>> To me it doesn't seem like the MDS servers are near the cache size
> >>>> limit, so it has to be the clients, right? In a different setup it
> >>>> helped to decrease the client_oc_size from 200MB to 100MB, but then
> >>>> there's also client_cache_size with 16K default. I'm not sure what
> >>>> the best approach would be here. I'd appreciate any comments on how
> >>>> to size the various cache/caps/threshold configurations.
> >>>>
> >>>> Thanks!
> >>>> Eugen
> >>>>
> >>>>
> >>>> ---snip---
> >>>> # ceph daemon mds.<MDS> session ls
> >>>>
> >>>>     "id": 2728101146,
> >>>>     "entity": {
> >>>>       "name": {
> >>>>         "type": "client",
> >>>>         "num": 2728101146
> >>>>       },
> >>>> [...]
> >>>>         "nonce": 1105499797
> >>>>       }
> >>>>     },
> >>>>     "state": "open",
> >>>>     "num_leases": 0,
> >>>>     "num_caps": 16158,
> >>>>     "request_load_avg": 0,
> >>>>     "uptime": 1118066.210318422,
> >>>>     "requests_in_flight": 0,
> >>>>     "completed_requests": [],
> >>>>     "reconnecting": false,
> >>>>     "recall_caps": {
> >>>>       "value": 788916.8276369586,
> >>>>       "halflife": 60
> >>>>     },
> >>>>     "release_caps": {
> >>>>       "value": 8.814981576458962,
> >>>>       "halflife": 60
> >>>>     },
> >>>>     "recall_caps_throttle": {
> >>>>       "value": 27379.27162576508,
> >>>>       "halflife": 1.5
> >>>>     },
> >>>>     "recall_caps_throttle2o": {
> >>>>       "value": 5382.261925615086,
> >>>>       "halflife": 0.5
> >>>>     },
> >>>>     "session_cache_liveness": {
> >>>>       "value": 12.91841737465921,
> >>>>       "halflife": 300
> >>>>     },
> >>>>     "cap_acquisition": {
> >>>>       "value": 0,
> >>>>       "halflife": 10
> >>>>     },
> >>>> [...]
> >>>>     "used_inos": [],
> >>>>     "client_metadata": {
> >>>>       "features": "0x0000000000003bff",
> >>>>       "entity_id": "cephfs_client",
> >>>>
> >>>>
> >>>> # ceph fs status
> >>>>
> >>>> cephfs - 25 clients
> >>>> ======
> >>>> +------+--------+----------------+---------------+-------+-------+
> >>>> | Rank | State  |      MDS       |    Activity   |  dns  |  inos |
> >>>> +------+--------+----------------+---------------+-------+-------+
> >>>> |  0   | active | stmailmds01d-3 | Reqs:   89 /s |  375k |  371k |
> >>>> |  1   | active | stmailmds01d-4 | Reqs:   64 /s |  386k |  383k |
> >>>> |  2   | active | stmailmds01a-3 | Reqs:    9 /s |  403k |  399k |
> >>>> |  3   | active | stmailmds01a-8 | Reqs:   23 /s |  393k |  390k |
> >>>> |  4   | active | stmailmds01a-2 | Reqs:   36 /s |  391k |  387k |
> >>>> |  5   | active | stmailmds01a-4 | Reqs:   57 /s |  394k |  390k |
> >>>> |  6   | active | stmailmds01a-6 | Reqs:   50 /s |  395k |  391k |
> >>>> |  7   | active | stmailmds01d-5 | Reqs:   37 /s |  384k |  380k |
> >>>> |  8   | active | stmailmds01a-5 | Reqs:   39 /s |  397k |  394k |
> >>>> |  9   | active |  stmailmds01a  | Reqs:   23 /s |  400k |  396k |
> >>>> |  10  | active | stmailmds01d-8 | Reqs:   74 /s |  402k |  399k |
> >>>> |  11  | active | stmailmds01d-6 | Reqs:   37 /s |  399k |  395k |
> >>>> |  12  | active |  stmailmds01d  | Reqs:   36 /s |  394k |  390k |
> >>>> |  13  | active | stmailmds01d-7 | Reqs:   80 /s |  397k |  393k |
> >>>> |  14  | active | stmailmds01d-2 | Reqs:   56 /s |  414k |  410k |
> >>>> |  15  | active | stmailmds01a-7 | Reqs:   25 /s |  390k |  387k |
> >>>> +------+--------+----------------+---------------+-------+-------+
> >>>> +-----------------+----------+-------+-------+
> >>>> |       Pool      |   type   |  used | avail |
> >>>> +-----------------+----------+-------+-------+
> >>>> | cephfs_metadata | metadata | 25.4G | 16.1T |
> >>>> |   cephfs_data   |   data   | 2078G | 16.1T |
> >>>> +-----------------+----------+-------+-------+
> >>>> +----------------+
> >>>> |  Standby MDS   |
> >>>> +----------------+
> >>>> | stmailmds01b-5 |
> >>>> | stmailmds01b-2 |
> >>>> | stmailmds01b-3 |
> >>>> |  stmailmds01b  |
> >>>> | stmailmds01b-7 |
> >>>> | stmailmds01b-8 |
> >>>> | stmailmds01b-6 |
> >>>> | stmailmds01b-4 |
> >>>> +----------------+
> >>>> MDS version: ceph version 14.2.22-404-gf74e15c2e55
> >>>> (f74e15c2e552b3359f5a51482dfd8b049e262743) nautilus (stable)
> >>>> ---snip---
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dparmar@xxxxxxxxxx
<https://www.redhat.com/>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux