Re: The snaptrim queue of PGs has not decreased for several days.

Eugen Block <eblock@xxxxxx> · Fri, 23 Aug 2024 08:07:32 +0000

Hi again,

I have a couple of questions about this.
What exactly happened to the PGs? They were queued for snaptrimming,  
but we didn't see any progress. Let's assume the average object size  
in that pool was around 2 MB (I don't have the actual numbers). Does  
that mean if osd_snap_trim_cost (1M default) was too low, those too  
large objects weren't trimmed? And then we split the PGs, reducing the  
average object size to 1 MB, these objects could be trimmed then,  
obviously. Does this explanation make sense?

I just browsed through the changes, if I understand the fix correctly,  
the average object size is now calculated automatically, right? Which  
makes a lot of sense to me, as an operator I don't want to care too  
much about the average object sizes since ceph should know them better  
than me. ;-)

Thanks!
Eugen

Zitat von Sridhar Seshasayee <sseshasa@xxxxxxxxxx>:

Hi Eugen,

There was a PR (https://github.com/ceph/ceph/pull/55040) related to mClock
and snaptrim
that was backported and available from v18.2.4. The fix more accurately
determines the
cost (instead of priority with wpq) of snaptrim operation depending on the
average size of
the objects in the PG. Depending on the active mClock profile, this should
help move the
snaptrim queue.

To prevent the cluster getting into a similar situation again is to try and
change the config
osd_snap_trim_cost (I think it's set to 1 MiB by default)  to a value that
more accurately
reflects the average object size of the PGs undergoing snaptrim and see if
it helps. In
general with mClock, lower cost ops spend less time in the queue.

-Sridhar

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx