Re: Issues with Ceph Cluster Behavior After Migration from ceph-ansible to cephadm and Upgrade to Quincy/Reef

Eugen Block <eblock@xxxxxx> · Mon, 17 Feb 2025 21:52:29 +0000

Hi,

that's an interesting observation, I haven't heard anything like that  
yet. More responses inline...

Zitat von Jeremi-Ernst Avenant <jeremi@xxxxxxxxxx>:

Hi all,

I recently migrated my Ceph cluster from *ceph-ansible* to *cephadm* (about
five months ago) and upgraded from *Pacific 16.2.11* to *Quincy (latest at
the time)*, followed by an upgrade to *Reef 18.2.4* two months later - due
to running an unsupported version of Ceph. Since this migration and
upgrade, I’ve noticed unexpected behavior in the cluster, particularly
related to OSD state awareness and balancer efficiency.
*1. OSD Nearfull Not Reported Until Restart*

I had an OSD exceed its configured nearfull threshold, but *Ceph did not
detect or report it* via ceph status. As a result, the cluster entered a
degraded state without any warnings. Only after manually restarting the
affected OSD did Ceph recognize the nearfull state and update the
corresponding pools accordingly. This behavior did not occur in
*Pacific/ceph-ansible*—Ceph would previously detect and act on the nearfull
condition without requiring a restart. This has been a common recurrence
since the migration/upgrade.

During a cluster upgrade (via cephadm), OSD daemons are restarted as  
well, it's a bit unclear to me how that would be different if you did  
a restart later after the upgrade. Do you find anything in the OSD logs?
Usually, I would suspect a MGR misbehaving, as it has often been the  
case in the last years. Every now and then, a MGR failover "fixes"  
things like false PG status etc. But then again, an upgrade would  
restart the MGRs as well, so it becomes more unlikely that a mgr fail  
would help. But it can't really hurt either, so maybe try that anyway.

*2. injectargs Not Taking Effect Until OSD Restart*

I've also observed that ceph tell osd.X injectargs --command ... often has
no effect. The OSD does not seem to apply the new arguments until it
is *manually
restarted*, at which point I can modify values via injectargs as expected.
However, after a few hours or days, the issue reappears, requiring another
restart to modify runtime settings.

You don't have to use the injectargs command anymore, you can just use  
'ceph config set osd.<OSD_ID> <CONFIG> <VALUE>' to change runtime  
configurations. Have you tried that as well here? Does that at least  
work? Are always the same OSDs involved or does it affect all of them?

*3. Ceph Balancer and PG Remapping Issues*

The Ceph balancer appears to be operating, but its behavior seems
inefficient compared to what we experienced on *Pacific*. It often fails to
optimize data distribution effectively, and I have to rely on the
*pgremapper* tool to manually intervene. Restarting OSDs seems to improve
the balancer’s effectiveness temporarily, suggesting that stale OSD state
information may be contributing to the issue.

That sounds strange, I don't have a good explanation for that atm, but  
sounds like it could be related to the OSDs not correctly reporting  
their status. Is there a chance that during the migration to cephadm,  
some OSDs weren't entirely migrated? Meaning that there are two  
systemd units targeting the same OSD? There have been reports on this  
list about that, it could explain that sometimes the "wrong" unit is  
targeted. Basically, you would need to check /var/lib/ceph/osd.X on  
the affected host and see if there's still an active OSD. With  
cephadm, all OSD data would be underneath  
/var/lib/ceph/{CEPH_FSID}/osd.X.

Since this is a *high-performance computing (HPC) environment*, manually
restarting OSDs on a regular basis is not a viable solution. These issues
did not occur when we were running *Pacific with ceph-ansible*, and I’m
wondering if others have experienced similar problems after migrating to
*cephadm* and/or upgrading to *Quincy/Reef*.

I noticed people on reddit with the same issue, but their resolution was
that they "I switch off my whole ceph cluster & switch it back on - to get
it working 100% again, - DAILY"

Has anyone else encountered these behaviors? Are there any known bugs or
workarounds that could help restore expected OSD state tracking and
balancer efficiency?

Any insights would be greatly appreciated!

Thanks,

--

*Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
Inter-University Institute for Data Intensive Astronomy
5th Floor, Department of Physics and Astronomy,
University of Cape Town

Tel: 021 959 4137 <0219592327>
Web: www.idia.ac.za | www.ilifu.ac.za
E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx>
Rondebosch, Cape Town, 7600, South Africa
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx