Hi Janek, My understanding is that the recall thresholds (see my list below) should be scaled proportionally. OTOH, I haven't played with the decay rates (and don't know if there's any significant value to tuning those). We have a recall tuning script that we use to deploy different factors whenever there are caps recall issues: X=$1 echo Scaling MDS Recall by ${X}x ceph tell mds.* injectargs -- --mds_recall_max_decay_threshold $((X*16*1024)) --mds_recall_max_caps $((X*5000)) --mds_recall_global_max_decay_threshold $((X*64*1024)) --mds_recall_warning_threshold $((X*32*1024)) --mds_cache_trim_threshold $((X*64*1024)) We currently run with all those options scaled up 6x the defaults, and we almost never have caps recall warnings these days, with a couple thousand cephfs clients. In the past month I've seen 2 different cases of a client not releasing caps even with these options: 1. A user had ceph-fuse mounted /cephfs/ on top of a 2nd ceph-fuse /cephfs. The outer (i.e lower) mountpoint/process had several thousand caps that could never be released until the user cleaned up their mounts. 2. A user running VSCodium, keeping 15k caps open.. the opportunistic caps recall eventually starts recalling those but the (el7 kernel) client won't release them. Stopping Codium seems to be the only way to release. Otherwise, 4GB is normally sufficient in our env for mds_cache_memory_limit (3 active MDSs), however this is highly workload dependent. If several clients are actively taking 100s of thousands of caps, then the 4GB MDS needs to be ultra busy recalling caps and latency increases. We saw this live a couple weeks ago: a few users started doing intensive rsyncs, and some other users noticed an MD latency increase; it was fixed immediately just by increasing the mem limit to 8GB. I agree some sort of tuning best practises should all be documented somehow, even though it's complex and rather delicate. -- Dan On Sat, Jan 25, 2020 at 5:54 PM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > Hello, > > Over the last week I have tried optimising the performance of our MDS > nodes for the large amount of files and concurrent clients we have. It > turns out that despite various stability fixes in recent releases, the > default configuration still doesn't appear to be optimal for keeping the > cache size under control and avoid intermittent I/O blocks. > > Unfortunately, it is very hard to tweak the configuration to something > that works, because the tuning parameters needed are largely > undocumented or only described in very technical terms in the source > code making them quite unapproachable for administrators not familiar > with all the CephFS internals. I would therefore like to ask if it were > possible to document the "advanced" MDS settings more clearly as to what > they do and in what direction they have to be tuned for more or less > aggressive cap recall, for instance (sometimes it is not clear if a > threshold is a min or a max threshold). > > I am am in the very (un)fortunate situation to have folders with a > several 100K direct sub folders or files (and one extreme case with > almost 7 million dentries), which is a pretty good benchmark for > measuring cap growth while performing operations on them. For the time > being, I came up with this configuration, which seems to work for me, > but is still far from optimal: > > mds basic mds_cache_memory_limit 10737418240 > mds advanced mds_cache_trim_threshold 131072 > mds advanced mds_max_caps_per_client 500000 > mds advanced mds_recall_max_caps 17408 > mds advanced mds_recall_max_decay_rate 2.000000 > > The parameters I am least sure about---because I understand the least > how they actually work---are mds_cache_trim_threshold and > mds_recall_max_decay_rate. Despite reading the description in > src/common/options.cc, I understand only half of what they're doing and > I am also not quite sure in which direction to tune them for optimal > results. > > Another point where I am struggling is the correct configuration of > mds_recall_max_caps. The default of 5K doesn't work too well for me, but > values above 20K also don't seem to be a good choice. While high values > result in fewer blocked ops and better performance without destabilising > the MDS, they also lead to slow but unbounded cache growth, which seems > counter-intuitive. 17K was the maximum I could go. Higher values work > for most use cases, but when listing very large folders with millions of > dentries, the MDS cache size slowly starts to exceed the limit after a > few hours, since the MDSs are failing to keep clients below > mds_max_caps_per_client despite not showing any "failing to respond to > cache pressure" warnings. > > With the configuration above, I do not have cache size issues any more, > but it comes at the cost of performance and slow/blocked ops. A few > hints as to how I could optimise my settings for better client > performance would be much appreciated and so would be additional > documentation for all the "advanced" MDS settings. > > Thanks a lot > Janek > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx