Re: Issues with Ceph Cluster Behavior After Migration from ceph-ansible to cephadm and Upgrade to Quincy/Reef

Jeremi-Ernst Avenant <jeremi@xxxxxxxxxx> · Wed, 19 Mar 2025 11:10:50 +0200

Hi

I've managed to replicated it in a new testbed environment & logged it on
the tracker - https://tracker.ceph.com/issues/70129

"If the fullest OSD, is between the value of "backfillfull_ratio" &
"nearfull_ratio", the CephFS cluster stalls, with minimum throughput. On
production it goes from ~25 GBps to +- 100 Mbps and on testbed from 700
Mbps to 100 Mbps."

Regards

On Mon, Feb 17, 2025 at 11:54 PM Eugen Block <eblock@xxxxxx> wrote:

> Hi,
>
> that's an interesting observation, I haven't heard anything like that
> yet. More responses inline...
>
>
> Zitat von Jeremi-Ernst Avenant <jeremi@xxxxxxxxxx>:
>
> > Hi all,
> >
> > I recently migrated my Ceph cluster from *ceph-ansible* to *cephadm*
> (about
> > five months ago) and upgraded from *Pacific 16.2.11* to *Quincy (latest
> at
> > the time)*, followed by an upgrade to *Reef 18.2.4* two months later -
> due
> > to running an unsupported version of Ceph. Since this migration and
> > upgrade, I’ve noticed unexpected behavior in the cluster, particularly
> > related to OSD state awareness and balancer efficiency.
> > *1. OSD Nearfull Not Reported Until Restart*
> >
> > I had an OSD exceed its configured nearfull threshold, but *Ceph did not
> > detect or report it* via ceph status. As a result, the cluster entered a
> > degraded state without any warnings. Only after manually restarting the
> > affected OSD did Ceph recognize the nearfull state and update the
> > corresponding pools accordingly. This behavior did not occur in
> > *Pacific/ceph-ansible*—Ceph would previously detect and act on the
> nearfull
> > condition without requiring a restart. This has been a common recurrence
> > since the migration/upgrade.
>
> During a cluster upgrade (via cephadm), OSD daemons are restarted as
> well, it's a bit unclear to me how that would be different if you did
> a restart later after the upgrade. Do you find anything in the OSD logs?
> Usually, I would suspect a MGR misbehaving, as it has often been the
> case in the last years. Every now and then, a MGR failover "fixes"
> things like false PG status etc. But then again, an upgrade would
> restart the MGRs as well, so it becomes more unlikely that a mgr fail
> would help. But it can't really hurt either, so maybe try that anyway.
>
> > *2. injectargs Not Taking Effect Until OSD Restart*
> >
> > I've also observed that ceph tell osd.X injectargs --command ... often
> has
> > no effect. The OSD does not seem to apply the new arguments until it
> > is *manually
> > restarted*, at which point I can modify values via injectargs as
> expected.
> > However, after a few hours or days, the issue reappears, requiring
> another
> > restart to modify runtime settings.
>
> You don't have to use the injectargs command anymore, you can just use
> 'ceph config set osd.<OSD_ID> <CONFIG> <VALUE>' to change runtime
> configurations. Have you tried that as well here? Does that at least
> work? Are always the same OSDs involved or does it affect all of them?
>
> > *3. Ceph Balancer and PG Remapping Issues*
> >
> > The Ceph balancer appears to be operating, but its behavior seems
> > inefficient compared to what we experienced on *Pacific*. It often fails
> to
> > optimize data distribution effectively, and I have to rely on the
> > *pgremapper* tool to manually intervene. Restarting OSDs seems to improve
> > the balancer’s effectiveness temporarily, suggesting that stale OSD state
> > information may be contributing to the issue.
>
> That sounds strange, I don't have a good explanation for that atm, but
> sounds like it could be related to the OSDs not correctly reporting
> their status. Is there a chance that during the migration to cephadm,
> some OSDs weren't entirely migrated? Meaning that there are two
> systemd units targeting the same OSD? There have been reports on this
> list about that, it could explain that sometimes the "wrong" unit is
> targeted. Basically, you would need to check /var/lib/ceph/osd.X on
> the affected host and see if there's still an active OSD. With
> cephadm, all OSD data would be underneath
> /var/lib/ceph/{CEPH_FSID}/osd.X.
>
> > Since this is a *high-performance computing (HPC) environment*, manually
> > restarting OSDs on a regular basis is not a viable solution. These issues
> > did not occur when we were running *Pacific with ceph-ansible*, and I’m
> > wondering if others have experienced similar problems after migrating to
> > *cephadm* and/or upgrading to *Quincy/Reef*.
> >
> > I noticed people on reddit with the same issue, but their resolution was
> > that they "I switch off my whole ceph cluster & switch it back on - to
> get
> > it working 100% again, - DAILY"
> >
> > Has anyone else encountered these behaviors? Are there any known bugs or
> > workarounds that could help restore expected OSD state tracking and
> > balancer efficiency?
> >
> > Any insights would be greatly appreciated!
> >
> > Thanks,
> >
> >
> > --
> >
> >
> >
> > *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
> > Inter-University Institute for Data Intensive Astronomy
> > 5th Floor, Department of Physics and Astronomy,
> > University of Cape Town
> >
> > Tel: 021 959 4137 <0219592327>
> > Web: www.idia.ac.za | www.ilifu.ac.za
> > E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx>
> > Rondebosch, Cape Town, 7600, South Africa
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

*Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
Inter-University Institute for Data Intensive Astronomy
5th Floor, Department of Physics and Astronomy,
University of Cape Town

Tel: 021 959 4137 <0219592327>
Web: www.idia.ac.za | www.ilifu.ac.za
E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx>
Rondebosch, Cape Town, 7600, South Africa
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx