I tried it with the offline compactation, and it didn't help a bit. It took ages per OSD and starting the OSD afterwards wasn't fast either. > Am 23.08.2024 um 18:16 schrieb Özkan Göksu <ozkangksu@xxxxxxxxx>: > > I have 12+12 = 24 servers with 8 x 4TB SAS SSD on each node. > I will use the weekend and I will start compaction on 12 servers on > Saturday and 12 others on Sunday and when the compaction is complete I will > unset nosnaptrim and let the cluster clean the 2 weeks of snaps leftover. > > Thank you for the advice, I will share the results when it's done. > > Regards. > > Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>, 23 Ağu 2024 Cum, 18:48 > tarihinde şunu yazdı: > >> Hi Özkan, >> >> in our case, we tried online compaction first, and it helped to resolve >> the issue completely. I did first test with a single OSD daemon (i.e. only >> online compaction of that single OSD), and checked that the load of that >> daemon went down significantly >> (that was while snaptrims with high sleep value were still going on). >> Then, I went in batches of 10 % of the cluster's OSDs, and they finished >> rather fast (few minutes) so I could do it without a downtime, actually. >> >> In older threads on this list, snaptrim issues which seemed similar (but >> not clearly related to an upgrade) required more heavy operations (either >> offline compaction or OSD recreation). >> Since online compaction is comparatibely "cheap", I'd always try this >> first, in my case, each OSD took less than 2-3 minutes for this, but of >> course your mileage may vary. >> >> Cheers, >> Oliver >> >>> Am 23.08.24 um 17:42 schrieb Özkan Göksu: >>> Hello Oliver. >>> >>> Thank you so much for the answer! >>> >>> I was thinking of re-creating the OSD's but if you are sure the >> compaction is the solution here then it's worth to try. >>> I'm planning to shutdown all the VM's and when the cluster is safe then >> I will try OSD compaction. >>> May I learn did you do online compaction or offline? >>> >>> Because I have 2 side and I can shutdown 1 entire rack and do the >> offline compaction and do the same thing other side when its done. >>> What do you think? >>> >>> Regards. >>> >>> >>> >>> >>> >>> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto: >> freyermuth@xxxxxxxxxxxxxxxxxx>>, 23 Ağu 2024 Cum, 18:06 tarihinde şunu >> yazdı: >>> >>> Hi Özkan, >>> >>> FWIW, we observed something similar after upgrading from Mimic => >> Nautilus => Octopus and starting to trim snapshots after. >>> >>> The size of our cluster was a bit smaller, but the effect was the >> same: When snapshot trimming started, OSDs went into high load and RBD I/O >> was extremely slow. >>> >>> We tried to use: >>> ceph tell osd.* injectargs '--osd-snap-trim-sleep 10' >>> first, which helped, but of course snapshots kept piling up. >>> >>> Finally, we performed only RocksDB compactions via: >>> >>> for A in {0..5}; do ceph tell osd.$A compact | sed 's/^/'$A': /' >> & done >>> >>> for some batches of OSDs, and their load went down heavily. Finally, >> after we'd churned through all OSDs, I/O load was low again, and we could >> go back to the default: >>> ceph tell osd.* injectargs '--osd-snap-trim-sleep 0' >>> >>> After this, the situation has stabilized for us. So my guess would >> be that the RocksDBs grew too much after the OMAP format conversion and the >> compaction shrank them again. >>> >>> Maybe that also helps in your case? >>> >>> Interestingly, we did not observe this on other clusters (one mainly >> for CephFS, another one with mirrored RBD volumes), which took the same >> upgrade path. >>> >>> Cheers, >>> Oliver >>> >>> Am 23.08.24 um 16:46 schrieb Özkan Göksu: >>>> Hello folks. >>>> >>>> We have a ceph cluster and we have 2000+ RBD drives on 20 nodes. >>>> >>>> We upgraded the cluster from 14.2.16 to 15.2.14 and after the >> upgrade we >>>> started to see snap trim issues. >>>> Without the "nosnaptrim" flag, the system is not usable right now. >>>> >>>> I think the problem is because of the omap conversion at Octopus >> upgrade. >>>> >>>> Note that the first time each OSD starts, it will do a format >> conversion to >>>> improve the accounting for “omap” data. This may take a few >> minutes to as >>>> much as a few hours (for an HDD with lots of omap data). You can >> disable >>>> this automatic conversion with: >>>> >>>> What should I do to solve this problem? >>>> >>>> Thanks. >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto: >> ceph-users@xxxxxxx> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: >> ceph-users-leave@xxxxxxx> >>> >>> -- >>> Oliver Freyermuth >>> Universität Bonn >>> Physikalisches Institut, Raum 1.047 >>> Nußallee 12 >>> 53115 Bonn >>> -- >>> Tel.: +49 228 73 2367 >>> Fax: +49 228 73 7869 >>> -- >>> >> >> -- >> Oliver Freyermuth >> Universität Bonn >> Physikalisches Institut, Raum 1.047 >> Nußallee 12 >> 53115 Bonn >> -- >> Tel.: +49 228 73 2367 >> Fax: +49 228 73 7869 >> -- >> >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx