On 07/08/2023 18:04, Patrick Donnelly wrote:
I'm trying to figure out what's happening to my backup cluster that
often grinds to a halt when cephfs automatically removes snapshots.
CephFS does not "automatically" remove snapshots. Do you mean the
snap_schedule mgr module?
Yup.
Almost all OSD's go to 100% CPU, ceph complains about slow ops, and
CephFS stops doing client i/o.
What health warnings do you see? You can try configuring snap trim:
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_snap_trim_sleep
Mostly a looot of SLOW_OPS. And I guess as a result of that
MDS_CLIENT_LATE_RELEASE, MDS_CLIENT_OLDEST_TID, MDS_SLOW_METADATA_IO,
MDS_TRIM warnings.
>> That won't explain why my cluster bogs down, but at least it gives
>> some visibility. Running 17.2.6 everywhere by the way.
>
> Please let us know how configuring snaptrim helps or not.
>
When I set nosnaptrim, all I/O immediately restores. When I unset
nosnaptrim, i/o stops again.
One of the symptoms is that OSD's go to about 350% cpu per daemon.
I got the feeling for a while that setting osd_snap_trim_sleep_ssd to 1
helped. I have 120 HDD osd's with wal/journal on ssd, does it even use
this value? Everything seemed stable, but eventually another few days
passed, and suddenly removing a snapshot brought the cluster down again.
So I guess that wasn't the cause.
Now what I'm trying to do is set osd_max_trimming_pgs to 0 for all
disks, and slowly setting it to 1 for a few osd's. This seems to work
for a while, but still it brings the cluster down every now and then,
and if not, the cluster is so slow it's almost unusable.
This whole troubleshooting process is taking weeks. I just noticed that
when 'the problem occurs', a lot of OSD's on a host (15 osd's per host)
start using a lot of CPU, even though for example only 3 OSD's on this
machine have their osd_max_trimming_pgs set to 1, the rest to 0. Disk
doesn't seem to be the bottleneck.
Restarting the daemons seems to solve the problem for a while, although
the high cpu usage pops up on a different osd node every time.
I am at a loss here. I'm almost thinking it's some kind of bug in the
osd daemons, but I have no idea how to troubleshoot this.
Angelo.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx