Re: snaptrim blocks io on ceph pacific even on fast NVMEs

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Wed, 10 Nov 2021 20:44:10 +0000

How many osd you have on 1 nvme drives?
We increased 2/nvme to 4/nvme and it improved the snap-trimming quite a lot.
I guess the utilisation of the nvmes when you snaptrim is not 100%.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2021. Nov 10., at 16:15, Christoph Adomeit <Christoph.Adomeit@xxxxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

I have upgraded my ceph cluster to pacific in August and updated to pacific 16.2.6 in September without problems.

I had no performance issues at all, the cluster has 3 nodes 64 core each, 15 blazing fast Samsung PM1733 NVME osds, 25 GBit/s Network and around 100 vms. The cluster was really fast. I never saw something like "snaptrim" in the ceph status output.

But the cluster seemed to slowly "eat" storage space. So yesterday I decided to add 3 more NVMEs, 1 for each node. In the second i added the first nvme as ceph osd the cluster was crashing. I had high loads on all osds and all the osds where dying again and again until i set nodown,noout,noscrub,nodeep-scrub and rtemoved the new osd. The the cluster recovered but had slow io and lots of snaptrim and snaptrim wait processes.

I made this smoother by setting --osd_snap_trim_sleep=3.0

Over night the snaptrim_wait pgs became 0 and i had 15% mor free space in the ceph cluster. But during the day the snaptrim_waits increased and increased.

I then set osd_snap_trim_sleep to 0.0 again and most vms had extremely high iowaits ore crashed.

Now I did a ceph osd set nosnaptrim and the cluster is flying again. Iowait 0 on all vms but count
of snaptrim wait is slowly increasing.

How can I get the snaptrims running fast and not affect ceph io performance ?
My theory is until yesterday for some reasons the snaptrims were not running for some reason and therefore the cluster was "eating" storage space. After crash yesterday and restarting the snaptrims the started.

In the logs I do not find the info whats going on. From what I read in the mailing lists and forums i suppose the problem might have somethin to do with ceph osds and omaps and compaction and rocksdb format or maybe with osd on disk format ?

Any ideas what the next steps could be ?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx