Hi I've managed to replicated it in a new testbed environment & logged it on the tracker - https://tracker.ceph.com/issues/70129 "If the fullest OSD, is between the value of "backfillfull_ratio" & "nearfull_ratio", the CephFS cluster stalls, with minimum throughput. On production it goes from ~25 GBps to +- 100 Mbps and on testbed from 700 Mbps to 100 Mbps." Regards On Mon, Feb 17, 2025 at 11:54 PM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > that's an interesting observation, I haven't heard anything like that > yet. More responses inline... > > > Zitat von Jeremi-Ernst Avenant <jeremi@xxxxxxxxxx>: > > > Hi all, > > > > I recently migrated my Ceph cluster from *ceph-ansible* to *cephadm* > (about > > five months ago) and upgraded from *Pacific 16.2.11* to *Quincy (latest > at > > the time)*, followed by an upgrade to *Reef 18.2.4* two months later - > due > > to running an unsupported version of Ceph. Since this migration and > > upgrade, I’ve noticed unexpected behavior in the cluster, particularly > > related to OSD state awareness and balancer efficiency. > > *1. OSD Nearfull Not Reported Until Restart* > > > > I had an OSD exceed its configured nearfull threshold, but *Ceph did not > > detect or report it* via ceph status. As a result, the cluster entered a > > degraded state without any warnings. Only after manually restarting the > > affected OSD did Ceph recognize the nearfull state and update the > > corresponding pools accordingly. This behavior did not occur in > > *Pacific/ceph-ansible*—Ceph would previously detect and act on the > nearfull > > condition without requiring a restart. This has been a common recurrence > > since the migration/upgrade. > > During a cluster upgrade (via cephadm), OSD daemons are restarted as > well, it's a bit unclear to me how that would be different if you did > a restart later after the upgrade. Do you find anything in the OSD logs? > Usually, I would suspect a MGR misbehaving, as it has often been the > case in the last years. Every now and then, a MGR failover "fixes" > things like false PG status etc. But then again, an upgrade would > restart the MGRs as well, so it becomes more unlikely that a mgr fail > would help. But it can't really hurt either, so maybe try that anyway. > > > *2. injectargs Not Taking Effect Until OSD Restart* > > > > I've also observed that ceph tell osd.X injectargs --command ... often > has > > no effect. The OSD does not seem to apply the new arguments until it > > is *manually > > restarted*, at which point I can modify values via injectargs as > expected. > > However, after a few hours or days, the issue reappears, requiring > another > > restart to modify runtime settings. > > You don't have to use the injectargs command anymore, you can just use > 'ceph config set osd.<OSD_ID> <CONFIG> <VALUE>' to change runtime > configurations. Have you tried that as well here? Does that at least > work? Are always the same OSDs involved or does it affect all of them? > > > *3. Ceph Balancer and PG Remapping Issues* > > > > The Ceph balancer appears to be operating, but its behavior seems > > inefficient compared to what we experienced on *Pacific*. It often fails > to > > optimize data distribution effectively, and I have to rely on the > > *pgremapper* tool to manually intervene. Restarting OSDs seems to improve > > the balancer’s effectiveness temporarily, suggesting that stale OSD state > > information may be contributing to the issue. > > That sounds strange, I don't have a good explanation for that atm, but > sounds like it could be related to the OSDs not correctly reporting > their status. Is there a chance that during the migration to cephadm, > some OSDs weren't entirely migrated? Meaning that there are two > systemd units targeting the same OSD? There have been reports on this > list about that, it could explain that sometimes the "wrong" unit is > targeted. Basically, you would need to check /var/lib/ceph/osd.X on > the affected host and see if there's still an active OSD. With > cephadm, all OSD data would be underneath > /var/lib/ceph/{CEPH_FSID}/osd.X. > > > Since this is a *high-performance computing (HPC) environment*, manually > > restarting OSDs on a regular basis is not a viable solution. These issues > > did not occur when we were running *Pacific with ceph-ansible*, and I’m > > wondering if others have experienced similar problems after migrating to > > *cephadm* and/or upgrading to *Quincy/Reef*. > > > > I noticed people on reddit with the same issue, but their resolution was > > that they "I switch off my whole ceph cluster & switch it back on - to > get > > it working 100% again, - DAILY" > > > > Has anyone else encountered these behaviors? Are there any known bugs or > > workarounds that could help restore expected OSD state tracking and > > balancer efficiency? > > > > Any insights would be greatly appreciated! > > > > Thanks, > > > > > > -- > > > > > > > > *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist > > Inter-University Institute for Data Intensive Astronomy > > 5th Floor, Department of Physics and Astronomy, > > University of Cape Town > > > > Tel: 021 959 4137 <0219592327> > > Web: www.idia.ac.za | www.ilifu.ac.za > > E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx> > > Rondebosch, Cape Town, 7600, South Africa > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist Inter-University Institute for Data Intensive Astronomy 5th Floor, Department of Physics and Astronomy, University of Cape Town Tel: 021 959 4137 <0219592327> Web: www.idia.ac.za | www.ilifu.ac.za E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx> Rondebosch, Cape Town, 7600, South Africa _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx