On Tue, 2022-08-16 at 10:52 +0000, Frank Schilder wrote: > Hi Chris, > > I would strongly advice not to use multi-MDS with 5000 clients on > luminous. I enabled it on mimic with ca. 1750 clients and it was > extremely dependent on luck if it converged to a stable distribution > of dirfrags or ended up doing export_dir operations all the time, > completely killing the FS performance. Also, even in mimic where > multi-MDS is no longer experimental, it still has a lot of bugs. You > will need to monitor the cluster tightly and might be forced to > intervene regularly, including going back and forth between single- > and multi-MDS. > Hi Frank, Thanks a lot for passing on your experience, that's really valuable info for a CephFS n00b like me. I have been wary of enabling multi-MDS as I figured I'd end up hitting a lot of issues on Luminuous, plus I'd be in even more deep over my head... > My recommendation would be to upgrade to octopus as fast as possible. > Its the first version that supports ephemeral pinning, which I would > say is pretty much the most useful multi-MDS mode, because it uses a > static dirfrag distribution over all MDSes avoiding the painful > export_dir operations. > OK yeah, I was just reading about ephemeral pinning, actually. Sounds like the best plan is to move to Octopus and then also ensure we have a solid upgrade plan moving forward. I only inherited this a couple of months ago and it's still the same original Lumiuous cluster. > You are in the unlucky situation that you will need 2 upgrades. I > think going L->M->O might be the least painful as it requires only 1 > OSD conversion. If you are a bit more adventurous, you could also aim > for L->N->P. Nautilus will probably not solve your performance issue > and any path including nautilus will have an extra OSD conversion. > However, in case you are using file store, you might want to go this > route and change from file store to bluestore with a re-deployment of > OSDs when you are on pacific. You will get out of some performance > issues with upgraded OSDs and pacific has fixes for a boat load of FS > snapshot issues. > I am wary of upgrading between releases in general, I've looked into this a bit and have noticed a number of people hit some strange issues. I guess the fortunate thing is that most people have probably experienced them already and solutions are probably relatively easy to find - on the downside, I'm not sure many people will be able to help as this cluster is so old, people probably have forgotten or moved on. But I guess I don't really have any other choice, it's either upgrade or perhaps building a brand new cluster and migrating data. Yeah, the cluster is also using filestore and it would be good to get onto bluestore at some point. The cache is already on NVMe at least, so that's helped. > In the mean time, can you roll out something like ganglia on all > client- and storage nodes and collect network traffic stats? I found > the packet report combined with bytes-in/out extremely useful to hunt > down rogue FS clients. If you use snapshots, also kworker CPU and > wait-IO on the client node are are indicative of problems with this > client. > That's a good idea, I'll look into that. Thanks again for the input, it's really helpful! Cheers, -c > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx