ceph 14.2.22 snaptrim and slow ops

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Wed, 18 Aug 2021 07:52:03 +0200

Hello,

about four weeks ago I upgraded my 14.2.16 cluster (144 4TB hdd-OSDs, 9 
hosts) from 14.2.16 to 14.2.22. The upgrade did not cause any trouble. 
The cluster is healthy. One thing is however new since the upgrade and 
somewhat irritating:

Each weekend in the night from sat to sun I now see health warnings 
about slow ops of some osds that I did never see before running 14.2.16. 
 The mentioned slow osd are not always the same and I did not find any 
hints in the smart values or logs that indicate a failing disk.

In this list I recently saw several other posts no matter if Nautilus or 
Octopus reporting the very same issue.

Is there a way to get around the slow ops warning, or is it a bug? Can I 
check if ceph really succeeds trimming removed snapshots or perhaps 
quits trimming because of the slow ops?

In "ceph osd pool health detail" I see a list for one pool that has 
about 30 snapshots created and also 30 snapshots deleted each week that 
now has 65 removed snaps entries shown as [1~6a,6c~30,9d~2d,cc~a, ...] 
in the output. Can I assume that trimming works if this 
[1~6a,6c~30,9d~2d,cc~a, ...] list does not get longer each week? Is 
there another way to check if trimming works?

Thanks for hints
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287 
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx