Hi Frank, That's unfortunate! Most of those options relax warnings and relax when a client is considered having too many caps. The option mds_recall_max_caps might be CPU intensive -- the MDS would be busy recalling caps if indeed you have clients which are hammering the MDSs with metadata workloads. What is your current `ceph fs status` output? If you have very active users, perhaps you can ask them to temporarily slow down and see the impact on your cluster? I'm not aware of any relation between caps recall and snap trimming. We don't use snapshots (until now some pacific tests) so I can't say if that is relevant to this issue. -- dan On Mon, Sep 6, 2021 at 11:18 AM Frank Schilder <frans@xxxxxx> wrote: > > Hi Dan, > > unfortunately, setting these parameters crashed the MDS cluster and we now have severe performance issues. Particularly bad is mds_recall_max_decay_rate. Even just setting it to the default value immediately makes all MDS daemons unresponsive and get failed by the MONs. I already set the mds beacon time-out to 10 minutes to avoid MDS daemons getting marked down too early when they need to trim a large (oversized) cache. The formerly active then failed daemons never recover, I have to restart them manually to get them back as stand-bys. > > We are running mimic-13.2.10. Does explicitly setting mds_recall_max_decay_rate enable a different code path in this version? > > I tried to fix the situation by removing all modified config pars (ceph config rm ...) again and doing a full restart of all daemons, first all stand-bys and then the active ones one by one. Unfortunately, this did not help. In addition, it looks like one of our fs data pools does not purge snapshots any more: > > pool 12 'con-fs2-meta1' no removed_snaps list shown > pool 13 'con-fs2-meta2' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] > pool 14 'con-fs2-data' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] > pool 17 'con-fs2-data-ec-ssd' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] > pool 19 'con-fs2-data2' removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] > > con-fs2-meta2 is the primary data pool. It does not store actual file data, we have con-fs2-data2 set as data pool on the fs root. Its the new recommended 3-pool layout with the meta-data- and the primary data pool storing meta-data only. > > The MDS daemons report 12 snapshots and if I interpret the removed_snaps info correctly, the pools con-fs2-meta2, con-fs2-data and con-fs2-data-ec-ssd store 12 snapshots. However, pool con-fs2-data2 has at least 20. We use rolling snapshots and it looks like the snapshots are not purged any more since I tried setting the MDS trimming parameters. This, in turn, is potentially a reason for the performance degradation we experience at the moment. > > I would be most grateful if you could provide some pointers as to what to look for with regards of why snapshots don't disappear and/or what might have happened to our MDS daemons performance wise. > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Frank Schilder <frans@xxxxxx> > Sent: 31 August 2021 16:23:15 > To: Dan van der Ster > Cc: ceph-users > Subject: Re: MDS daemons stuck in resolve, please help > > Hi Dan, > > I'm running mimic latest version. > > Thanks for the link to the PR, this looks good. > > Directory pinning does not work in mimic, I had another case on that. The required xattribs are not implemented although documented. The default load balancing seems to work quite well for us - I saw the warnings about possible performance impacts in the documentation. I think I scaled the MDS cluster up to the right size, the MDS daemons usually manage to trim their cache well below the reservation point and can take peak loads without moving clients around. All MDSes have about the same average request load. With the reorganised meta data pool the aggregated performance is significantly better than with a single MDS. I would say that most of the time it scales with the MDS count. > > Of course, the find over the entire FS tree did lead to a lot of fun. Fortunately, users don't do that. > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > Sent: 31 August 2021 15:26:17 > To: Frank Schilder > Cc: ceph-users > Subject: Re: Re: MDS daemons stuck in resolve, please help > > Hi Frank, > > It helps if you start threads reminding us which version you're running. > > During nautilus the caps recall issue (which is AFAIK the main cause > of mds cache overruns) should be solved with this PR: > https://github.com/ceph/ceph/pull/39134/files > If you're not running >= 14.2.17 then you should probably just apply > these settings all together. (Don't worry which order they are set or > whatever -- just make the changes within a short window). > > Also, to try to understand your MDS issues -- are you using pinning or > letting metadata move around between MDSs ? > find / might wreak havoc if you aren't pinning. > > -- dan > > > On Tue, Aug 31, 2021 at 2:13 PM Frank Schilder <frans@xxxxxx> wrote: > > > > I seem to be hit by the problem discussed here: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/AOYWQSONTFROPB4DXVYADWW7V25C3G6Z/ > > > > In my case, what helped getting the cash size growth somewhat under control was > > > > ceph config set mds mds_recall_max_caps 10000 > > > > I'm not sure about the options mds_recall_max_decay_threshold and mds_recall_max_decay_rate. The description I found is quite vague about the effect of these and the defaults also don't match (mimic output): > > > > # ceph config help mds_recall_max_caps > > mds_recall_max_caps - maximum number of caps to recall from client session in single recall > > (size_t, advanced) > > Default: 5000 > > Can update at runtime: true > > Services: [mds] > > > > # ceph config help mds_recall_max_decay_threshold > > mds_recall_max_decay_threshold - decay threshold for throttle on recalled caps on a session > > (size_t, advanced) > > Default: 16384 > > Can update at runtime: true > > Services: [mds] > > > > # ceph config help mds_recall_max_decay_rate > > mds_recall_max_decay_rate - decay rate for throttle on recalled caps on a session > > (double, advanced) > > Default: 2.500000 > > Can update at runtime: true > > Services: [mds] > > > > I assume higher mds_recall_max_decay_threshold and lower mds_recall_max_decay_rate increase speed of caps recall? What increments would be safe to use? For example, is it really a good idea to go from 16384 to the new default 131072 in one go? > > > > Thanks for any advice and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Frank Schilder <frans@xxxxxx> > > Sent: 30 August 2021 21:37:18 > > To: ceph-users > > Subject: Re: MDS daemons stuck in resolve, please help > > > > The MDS cluster came back up again, but I lost a number of standby MDS daemons. I cleared the OSD blacklist, but they do not show up as stand-by daemons again. The daemon itself is running, but does not seem to re-join the cluster. The log shows: > > > > 2021-08-30 21:32:34.896 7fc9e22f8700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > 2021-08-30 21:32:39.896 7fc9e22f8700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > 2021-08-30 21:32:44.896 7fc9e22f8700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > 2021-08-30 21:32:49.897 7fc9e22f8700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > > > I just had another frenzy of MDS fail-overs and am running out of stand-b daemons. A restart of a "missing" daemon brings it back to life, but I would prefer this to work by itself. Any ideas on what's going on are welcome. > > > > Thanks and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Frank Schilder <frans@xxxxxx> > > Sent: 30 August 2021 21:12:53 > > To: ceph-users > > Subject: MDS daemons stuck in resolve, please help > > > > Hi all, > > > > our MDS cluster got degraded after an MDS had an oversized cache and crashed. Other MDS daemons followed suit and now they are stuck in this state: > > > > [root@gnosis ~]# ceph fs status > > con-fs2 - 1640 clients > > ======= > > +------+---------+---------+---------------+-------+-------+ > > | Rank | State | MDS | Activity | dns | inos | > > +------+---------+---------+---------------+-------+-------+ > > | 0 | resolve | ceph-24 | | 22.1k | 22.0k | > > | 1 | resolve | ceph-13 | | 769k | 758k | > > | 2 | active | ceph-16 | Reqs: 0 /s | 255k | 255k | > > | 3 | resolve | ceph-09 | | 5624 | 5619 | > > +------+---------+---------+---------------+-------+-------+ > > +---------------------+----------+-------+-------+ > > | Pool | type | used | avail | > > +---------------------+----------+-------+-------+ > > | con-fs2-meta1 | metadata | 1828M | 1767G | > > | con-fs2-meta2 | data | 0 | 1767G | > > | con-fs2-data | data | 1363T | 6049T | > > | con-fs2-data-ec-ssd | data | 239G | 4241G | > > | con-fs2-data2 | data | 10.2T | 5499T | > > +---------------------+----------+-------+-------+ > > +-------------+ > > | Standby MDS | > > +-------------+ > > | ceph-12 | > > | ceph-08 | > > | ceph-23 | > > | ceph-11 | > > +-------------+ > > > > I tried to set max_mds to 1 to no avail. How can I get the MDS daemons back up? > > > > Thanks and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx