On Mon, 2022-08-22 at 11:13 +0000, Frank Schilder wrote: > Hi Chris. > > > Interestingly, when duration gets long and performance gets bad ... > > This observation is likely due to MDS and client cache. My experience > with ceph's cache implementations is that, well, they seem not that > great. I think what we both observe is, that with an empty cache > everything works fine. As soon as the cache starts saturating, memory > needs to be freed. This seems to result in heavy fragmentation, > making certain operations slower and slower (for example, alloc and > free). OK thanks, makes sense. The interesting thing to me was how long it takes to get locks specifically - which indicates more a problem with client end, rather than MDS end (but I'm not certain). > There was a ceph-user thread discussing performance as a function of > cache size and the finding was that performance increases with > reducing cache size. > Right, this was the kind of thing I was wondering too - that reducing the cache might actually help keep things under control earlier. Thanks! > Since then I use the following mds settings: > > client_cache_size = > 8192 > mds_cache_memory_limit = > 17179869184 > mds_cache_reservation = > 0.500000 > mds_max_caps_per_client = > 65536 > mds_min_caps_per_client = > 4096 > mds_recall_max_caps = > 32768 > Thanks a lot. I plan to start slowly testing some changes with these settings and monitor the result. Hopefully I can come to some values which work best for this cluster. > I'm thinking about increasing the mid-point (mds_cache_reservation) > even further to have a large amount of free cache mem to absorb load > bursts and release when the burst is over. > Right, that makes sense. Your setting is currently set to 50%, right? Mine is set to the default 5% so definitely seems like a good thing to increase. If I understand correctly, this will reserve more memory for new ops, which should mean those ops don't come under as much pressure if the cache is already full (and assuming the MDS can clear out the cache fast enough). > About your graphs: to have IO-peaks after MDS failover is expected. > Clients actually continue to make IO requests that go to system's > buffer. Once the MDS is back up, these buffered OPS get applied, > leading to a temporary increase of load. The only exceptional raise > is the last one, which might have coincided with someone starting to > do a lot of IO. > OK, so my analysis is probably wrong then, in that it's not a case that the clients are writing their caches out becasue MDS is going away, it's just that while the MDS is going away, they keep caching, then we see more writes when it comes back. The last one is interesting in that the MDS was up for a much longer time, which I thought explains why the peak is higher - i.e. there was more client data cached that got flushed... Finally, do you know of anything I can do on the client side to help keep its cache down? I'm also thinking of testing mounts with noatime and nodiratime options which might help to reduce the amount of MDS ops at least, but not sure of any other client side settings or tweaks that might help. Thanks again for sticking this thread out and helping with all your answers. I have learned a lot (still lots to go!). I guess next steps for me are to try and see if manipulating mds cache and caps settings will help at all. But ultimately, I need to upgrade, expand and tune the cluster... Cheers, -c > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx