Hi again,
I have a couple of questions about this.
What exactly happened to the PGs? They were queued for snaptrimming,
but we didn't see any progress. Let's assume the average object size
in that pool was around 2 MB (I don't have the actual numbers). Does
that mean if osd_snap_trim_cost (1M default) was too low, those too
large objects weren't trimmed? And then we split the PGs, reducing the
average object size to 1 MB, these objects could be trimmed then,
obviously. Does this explanation make sense?
I just browsed through the changes, if I understand the fix correctly,
the average object size is now calculated automatically, right? Which
makes a lot of sense to me, as an operator I don't want to care too
much about the average object sizes since ceph should know them better
than me. ;-)
Thanks!
Eugen
Zitat von Sridhar Seshasayee <sseshasa@xxxxxxxxxx>:
Hi Eugen,
There was a PR (https://github.com/ceph/ceph/pull/55040) related to mClock
and snaptrim
that was backported and available from v18.2.4. The fix more accurately
determines the
cost (instead of priority with wpq) of snaptrim operation depending on the
average size of
the objects in the PG. Depending on the active mClock profile, this should
help move the
snaptrim queue.
To prevent the cluster getting into a similar situation again is to try and
change the config
osd_snap_trim_cost (I think it's set to 1 MiB by default) to a value that
more accurately
reflects the average object size of the PGs undergoing snaptrim and see if
it helps. In
general with mClock, lower cost ops spend less time in the queue.
-Sridhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx