I have 12+12 = 24 servers with 8 x 4TB SAS SSD on each node. I will use the weekend and I will start compaction on 12 servers on Saturday and 12 others on Sunday and when the compaction is complete I will unset nosnaptrim and let the cluster clean the 2 weeks of snaps leftover. Thank you for the advice, I will share the results when it's done. Regards. Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>, 23 Ağu 2024 Cum, 18:48 tarihinde şunu yazdı: > Hi Özkan, > > in our case, we tried online compaction first, and it helped to resolve > the issue completely. I did first test with a single OSD daemon (i.e. only > online compaction of that single OSD), and checked that the load of that > daemon went down significantly > (that was while snaptrims with high sleep value were still going on). > Then, I went in batches of 10 % of the cluster's OSDs, and they finished > rather fast (few minutes) so I could do it without a downtime, actually. > > In older threads on this list, snaptrim issues which seemed similar (but > not clearly related to an upgrade) required more heavy operations (either > offline compaction or OSD recreation). > Since online compaction is comparatibely "cheap", I'd always try this > first, in my case, each OSD took less than 2-3 minutes for this, but of > course your mileage may vary. > > Cheers, > Oliver > > Am 23.08.24 um 17:42 schrieb Özkan Göksu: > > Hello Oliver. > > > > Thank you so much for the answer! > > > > I was thinking of re-creating the OSD's but if you are sure the > compaction is the solution here then it's worth to try. > > I'm planning to shutdown all the VM's and when the cluster is safe then > I will try OSD compaction. > > May I learn did you do online compaction or offline? > > > > Because I have 2 side and I can shutdown 1 entire rack and do the > offline compaction and do the same thing other side when its done. > > What do you think? > > > > Regards. > > > > > > > > > > > > Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto: > freyermuth@xxxxxxxxxxxxxxxxxx>>, 23 Ağu 2024 Cum, 18:06 tarihinde şunu > yazdı: > > > > Hi Özkan, > > > > FWIW, we observed something similar after upgrading from Mimic => > Nautilus => Octopus and starting to trim snapshots after. > > > > The size of our cluster was a bit smaller, but the effect was the > same: When snapshot trimming started, OSDs went into high load and RBD I/O > was extremely slow. > > > > We tried to use: > > ceph tell osd.* injectargs '--osd-snap-trim-sleep 10' > > first, which helped, but of course snapshots kept piling up. > > > > Finally, we performed only RocksDB compactions via: > > > > for A in {0..5}; do ceph tell osd.$A compact | sed 's/^/'$A': /' > & done > > > > for some batches of OSDs, and their load went down heavily. Finally, > after we'd churned through all OSDs, I/O load was low again, and we could > go back to the default: > > ceph tell osd.* injectargs '--osd-snap-trim-sleep 0' > > > > After this, the situation has stabilized for us. So my guess would > be that the RocksDBs grew too much after the OMAP format conversion and the > compaction shrank them again. > > > > Maybe that also helps in your case? > > > > Interestingly, we did not observe this on other clusters (one mainly > for CephFS, another one with mirrored RBD volumes), which took the same > upgrade path. > > > > Cheers, > > Oliver > > > > Am 23.08.24 um 16:46 schrieb Özkan Göksu: > > > Hello folks. > > > > > > We have a ceph cluster and we have 2000+ RBD drives on 20 nodes. > > > > > > We upgraded the cluster from 14.2.16 to 15.2.14 and after the > upgrade we > > > started to see snap trim issues. > > > Without the "nosnaptrim" flag, the system is not usable right now. > > > > > > I think the problem is because of the omap conversion at Octopus > upgrade. > > > > > > Note that the first time each OSD starts, it will do a format > conversion to > > > improve the accounting for “omap” data. This may take a few > minutes to as > > > much as a few hours (for an HDD with lots of omap data). You can > disable > > > this automatic conversion with: > > > > > > What should I do to solve this problem? > > > > > > Thanks. > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx <mailto: > ceph-users@xxxxxxx> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: > ceph-users-leave@xxxxxxx> > > > > -- > > Oliver Freyermuth > > Universität Bonn > > Physikalisches Institut, Raum 1.047 > > Nußallee 12 > > 53115 Bonn > > -- > > Tel.: +49 228 73 2367 > > Fax: +49 228 73 7869 > > -- > > > > -- > Oliver Freyermuth > Universität Bonn > Physikalisches Institut, Raum 1.047 > Nußallee 12 > 53115 Bonn > -- > Tel.: +49 228 73 2367 > Fax: +49 228 73 7869 > -- > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx