On Tue, Feb 28, 2023 at 12:56 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote: > I think a few other things that could help would be `ceph osd df tree` > which will show the hierarchy across different crush domains. > Good idea: https://pastebin.com/y07TKt52 > And if you’re doing something like erasure coded pools, or something other > than replication 3, maybe `ceph osd crush rule dump` may provide some > further context with the tree output. > No erasure coded pools - all replication. > > Also, the cluster is running Luminous (12) which went EOL 3 years ago > tomorrow > <https://docs.ceph.com/en/latest/releases/index.html#archived-releases>. > So there are also likely a good bit of improvements all around under the > hood to be gained by moving forward from Luminous. > Yes, nobody here wants to touch upgrading this at all - too terrified of breaking things. This ceph deployment is serving several hundred VMs. The general feeling is that we're stuck on luminous and that it's destructive to upgrade to anything else. I refuse to believe that is true. At least if we upgraded everything to 12.2.3 we'd have the 'balancer' stuff that came with I think 12.2.2... What would you recommend upgrading luminous to? > Though, I would say take care of the scrub errors prior to doing any major > upgrades, as well as checking your upgrade path (can only upgrade two > releases at a time, if you have filestore OSDs, etc). > Yeah, there seems to be a fear that attempting to repair those will negatively impact performance even more. I disagree and think we should do them immediately. Also, there seems to be a belief that bluestore is an 'all-or-nothing' proposition and that it's impossible to migrate from filestore to bluestore. Yet I see that you can have a mixture of both in your deployments and it's indeed possible to migrate from filestore to bluestore. TL;DR -- there is a *lot* of fear of touching this thing because nobody is truly an 'expert' in it atm. But not touching it is why we've gotten ourselves into a situation with broken stuff and horrendous performance. Thanks Reed! -Dave > > -Reed > > On Feb 28, 2023, at 11:12 AM, Dave Ingram <dave@xxxxxxxxxxxx> wrote: > > There is a > lot of variability in drive sizes - two different sets of admins added > disks sized between 6TB and 16TB and I suspect this and imbalanced > weighting is to blame. > > CEPH OSD DF: > > (not going to paste that all in here): https://pastebin.com/CNW5RKWx > > What else am I missing in terms of what to share with you all? > > Thanks all, > -Dave > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx