Hi,
as expected the issue is not resolved and turned up again a couple of
hours later. Here's the tracker issue:
https://tracker.ceph.com/issues/67702
I also attached a log snippet from one osd with debug_osd 10 to the
tracker. Let me know if you need anything else, I'll stay in touch
with Giovanna.
Thanks!
Eugen
Zitat von Sridhar Seshasayee <sseshasa@xxxxxxxxxx>:
Hi Eugen,
On Fri, Aug 23, 2024 at 1:37 PM Eugen Block <eblock@xxxxxx> wrote:
Hi again,
I have a couple of questions about this.
What exactly happened to the PGs? They were queued for snaptrimming,
but we didn't see any progress. Let's assume the average object size
in that pool was around 2 MB (I don't have the actual numbers). Does
that mean if osd_snap_trim_cost (1M default) was too low, those too
large objects weren't trimmed? And then we split the PGs, reducing the
average object size to 1 MB, these objects could be trimmed then,
obviously. Does this explanation make sense?
If you have the OSD logs, I can take a look and see why the snaptrim ops
did not make progress. The cost is one contributing factor on the position
of the op in the queue. Therefore, even though the cost incorrectly
represents the actual average size of the objects in the PG, the op should
be scheduled based on the set cost and the profile allocations.
The OSDs appear to be NVMe based is what I understand from the
thread. Based on the actions taken to resolve the situation (increased
pg_num to 64), I think something else was up on the cluster. For NVMe
based cluster, the current cost shouldn't cause stalling of the snaptrim
ops. I'd suggest raising an upstream tracker with your observation and
OSD logs to investigate this further.
I just browsed through the changes, if I understand the fix correctly,
the average object size is now calculated automatically, right? Which
makes a lot of sense to me, as an operator I don't want to care too
much about the average object sizes since ceph should know them better
than me. ;-)
Yes, that's correct. This fix was part of the effort to incrementally
include
background OSD operations to be scheduled by mClock.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx