Hello Oliver. I confirm your solution works. Compaction takes 2min for each SSD and I spent 8 hours for the whole cluster. While compaction is running I was have nosnaptrim flag. When the compaction completed I set "ceph tell osd.* injectargs '--osd-snap-trim-sleep 10'" and unset nosnaptrim. Snap trim took 1 day to clear 2 weeks of snaps and while snaps are trimming thanks to '--osd-snap-trim-sleep 10' I didn't see any slow down. Thank you for the advice. Boris <bb@xxxxxxxxx>, 23 Ağu 2024 Cum, 19:24 tarihinde şunu yazdı: > I tried it with the offline compactation, and it didn't help a bit. > > It took ages per OSD and starting the OSD afterwards wasn't fast either. > > > > > Am 23.08.2024 um 18:16 schrieb Özkan Göksu <ozkangksu@xxxxxxxxx>: > > > > I have 12+12 = 24 servers with 8 x 4TB SAS SSD on each node. > > I will use the weekend and I will start compaction on 12 servers on > > Saturday and 12 others on Sunday and when the compaction is complete I > will > > unset nosnaptrim and let the cluster clean the 2 weeks of snaps leftover. > > > > Thank you for the advice, I will share the results when it's done. > > > > Regards. > > > > Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>, 23 Ağu 2024 Cum, > 18:48 > > tarihinde şunu yazdı: > > > >> Hi Özkan, > >> > >> in our case, we tried online compaction first, and it helped to resolve > >> the issue completely. I did first test with a single OSD daemon (i.e. > only > >> online compaction of that single OSD), and checked that the load of that > >> daemon went down significantly > >> (that was while snaptrims with high sleep value were still going on). > >> Then, I went in batches of 10 % of the cluster's OSDs, and they finished > >> rather fast (few minutes) so I could do it without a downtime, actually. > >> > >> In older threads on this list, snaptrim issues which seemed similar (but > >> not clearly related to an upgrade) required more heavy operations > (either > >> offline compaction or OSD recreation). > >> Since online compaction is comparatibely "cheap", I'd always try this > >> first, in my case, each OSD took less than 2-3 minutes for this, but of > >> course your mileage may vary. > >> > >> Cheers, > >> Oliver > >> > >>> Am 23.08.24 um 17:42 schrieb Özkan Göksu: > >>> Hello Oliver. > >>> > >>> Thank you so much for the answer! > >>> > >>> I was thinking of re-creating the OSD's but if you are sure the > >> compaction is the solution here then it's worth to try. > >>> I'm planning to shutdown all the VM's and when the cluster is safe then > >> I will try OSD compaction. > >>> May I learn did you do online compaction or offline? > >>> > >>> Because I have 2 side and I can shutdown 1 entire rack and do the > >> offline compaction and do the same thing other side when its done. > >>> What do you think? > >>> > >>> Regards. > >>> > >>> > >>> > >>> > >>> > >>> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto: > >> freyermuth@xxxxxxxxxxxxxxxxxx>>, 23 Ağu 2024 Cum, 18:06 tarihinde şunu > >> yazdı: > >>> > >>> Hi Özkan, > >>> > >>> FWIW, we observed something similar after upgrading from Mimic => > >> Nautilus => Octopus and starting to trim snapshots after. > >>> > >>> The size of our cluster was a bit smaller, but the effect was the > >> same: When snapshot trimming started, OSDs went into high load and RBD > I/O > >> was extremely slow. > >>> > >>> We tried to use: > >>> ceph tell osd.* injectargs '--osd-snap-trim-sleep 10' > >>> first, which helped, but of course snapshots kept piling up. > >>> > >>> Finally, we performed only RocksDB compactions via: > >>> > >>> for A in {0..5}; do ceph tell osd.$A compact | sed 's/^/'$A': /' > >> & done > >>> > >>> for some batches of OSDs, and their load went down heavily. Finally, > >> after we'd churned through all OSDs, I/O load was low again, and we > could > >> go back to the default: > >>> ceph tell osd.* injectargs '--osd-snap-trim-sleep 0' > >>> > >>> After this, the situation has stabilized for us. So my guess would > >> be that the RocksDBs grew too much after the OMAP format conversion and > the > >> compaction shrank them again. > >>> > >>> Maybe that also helps in your case? > >>> > >>> Interestingly, we did not observe this on other clusters (one mainly > >> for CephFS, another one with mirrored RBD volumes), which took the same > >> upgrade path. > >>> > >>> Cheers, > >>> Oliver > >>> > >>> Am 23.08.24 um 16:46 schrieb Özkan Göksu: > >>>> Hello folks. > >>>> > >>>> We have a ceph cluster and we have 2000+ RBD drives on 20 nodes. > >>>> > >>>> We upgraded the cluster from 14.2.16 to 15.2.14 and after the > >> upgrade we > >>>> started to see snap trim issues. > >>>> Without the "nosnaptrim" flag, the system is not usable right now. > >>>> > >>>> I think the problem is because of the omap conversion at Octopus > >> upgrade. > >>>> > >>>> Note that the first time each OSD starts, it will do a format > >> conversion to > >>>> improve the accounting for “omap” data. This may take a few > >> minutes to as > >>>> much as a few hours (for an HDD with lots of omap data). You can > >> disable > >>>> this automatic conversion with: > >>>> > >>>> What should I do to solve this problem? > >>>> > >>>> Thanks. > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto: > >> ceph-users@xxxxxxx> > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: > >> ceph-users-leave@xxxxxxx> > >>> > >>> -- > >>> Oliver Freyermuth > >>> Universität Bonn > >>> Physikalisches Institut, Raum 1.047 > >>> Nußallee 12 > >>> 53115 Bonn > >>> -- > >>> Tel.: +49 228 73 2367 > >>> Fax: +49 228 73 7869 > >>> -- > >>> > >> > >> -- > >> Oliver Freyermuth > >> Universität Bonn > >> Physikalisches Institut, Raum 1.047 > >> Nußallee 12 > >> 53115 Bonn > >> -- > >> Tel.: +49 228 73 2367 > >> Fax: +49 228 73 7869 > >> -- > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx