Den fre 7 mars 2025 kl 17:05 skrev Nicola Mori <mori@xxxxxxxxxx>: > > Dear Ceph users, > > after upgrading from 19.2.0 to 19.2.1 (via cephadm) my cluster started > showing some warnings never seen before: > > 29 OSD(s) experiencing slow operations in BlueStore > 13 OSD(s) experiencing stalled read in db device of BlueFS > > I searched for these messages but didn't find much. I also noticed that > when browsing the CephFS folders (using the kernel module for the > client) sometimes the client gets stuck for a long time before showing > the folder content; however I don't know if this can be related to the > above warnings. Would it be possible for the people that implement these warnings (or lower the thresholds significantly so they suddenly trigger) to put something visible somewhere? It seems like these kinds of warnings (like "too many PGs per OSD" around Luminous, "Large omaps" a bit later) pop out of nowhere for us admins in a minor release and while I could find https://github.com/ceph/ceph/pull/59464/files by really,really looking through the Changelog for 19.2.1, it is by no means easy to know for Nicola here above if this warning has been in there for 3 major releases or if the condition appeared randomly at the same time as the minor upgrade. Is this the way we ceph admins are expected to "learn" about how these things work, and wonder if it was related to what someone recently did or if it indicates a bad set of drives or just new ceph code that isn't correctly tuned yet? It seems a bit like a pattern to just drop surprises like this on us with no info on what to do, and I would like to think that this is just a series of "random" accidents that just look very much alike, but there seem to be few good explanations for why it happens so often. As seen by the pull request, someone did a lot of writing about rationale for the addition, some things about the values chosen, and all the changelog had was this line "squid: os/bluestore: Warning added for slow operations and stalled read (pr#59464, Md Mahamudur Rahaman Sajib)" hidden among all the other changes. If we want people to dare run latest releases so we can notice the real bugs early, we need to be able to get information about "you might see the text experiencing slow operations in BlueStore and this means you should read URL-GOES-HERE" or something. Otherwise we are looking at https://docs.ceph.com/en/latest/releases/squid/#notable-changes seeing nothing, googling for this error message will be super useful 2.5 years from now when "everyone" has had time to post on reddit and maillist and pasted it on slack/IRC, but for the early adopters of ceph 19.2.x, this feels like a bad way to start validating a cluster upgrade by having hard-to-find warnings suddenly pop up. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx