Hi,
On 11/17/21 16:09, Francois Legrand wrote:
Now we are investingating this snapshot issue and I noticed that as long
as we remove one snapshot alone, things seems to go well (only some pgs
in "unknown state" but no global warning nor slow ops, osd down or
crash). But if we remove several snapshots at the same time (I tryed
with 2 for the moment), then we start to have some slow ops. I guess
that if I remove 4 or 5 snapshots at the same time I will end with osds
marked down and/or crash as we had just after the upgrade (I am not sure
I want to try that with our production cluster).
Maybe you want to try to tweak `osd_snap_trim_sleep`. On Octopus/Pacific
with hybrid OSDs the snapshots deletions seems pretty stable in our
testing. Out of curiosity are your OSD on SSD? I suspect that the
default setting of `osd_snap_trim_sleep` for SSD OSD could affect
performance [1].
Cheers,
[1]:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/FPRB2DW4N427U25LEHYICOKI4C37BKSO/
--
Arthur Outhenin-Chalandre
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx