Re: Snaptrim issue after nautilus to octopus upgrade

Özkan Göksu <ozkangksu@xxxxxxxxx> · Mon, 26 Aug 2024 15:39:22 +0300

Hello Oliver.

I confirm your solution works.
Compaction takes 2min for each SSD and I spent 8 hours for the whole
cluster.
While compaction is running I was have nosnaptrim flag.
When the compaction completed I set "ceph tell osd.* injectargs
'--osd-snap-trim-sleep 10'" and unset nosnaptrim.
Snap trim took 1 day to clear 2 weeks of snaps and while snaps are trimming
thanks to  '--osd-snap-trim-sleep 10' I didn't see any slow down.

Thank you for the advice.

Boris <bb@xxxxxxxxx>, 23 Ağu 2024 Cum, 19:24 tarihinde şunu yazdı:

> I tried it with the offline compactation, and it didn't help a bit.
>
> It took ages per OSD and starting the OSD afterwards wasn't fast either.
>
>
>
> > Am 23.08.2024 um 18:16 schrieb Özkan Göksu <ozkangksu@xxxxxxxxx>:
> >
> > I have 12+12 = 24 servers with 8 x 4TB SAS SSD on each node.
> > I will use the weekend and I will start compaction on 12 servers on
> > Saturday and 12 others on Sunday and when the compaction is complete I
> will
> > unset nosnaptrim and let the cluster clean the 2 weeks of snaps leftover.
> >
> > Thank you for the advice, I will share the results when it's done.
> >
> > Regards.
> >
> > Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>, 23 Ağu 2024 Cum,
> 18:48
> > tarihinde şunu yazdı:
> >
> >> Hi Özkan,
> >>
> >> in our case, we tried online compaction first, and it helped to resolve
> >> the issue completely. I did first test with a single OSD daemon (i.e.
> only
> >> online compaction of that single OSD), and checked that the load of that
> >> daemon went down significantly
> >> (that was while snaptrims with high sleep value were still going on).
> >> Then, I went in batches of 10 % of the cluster's OSDs, and they finished
> >> rather fast (few minutes) so I could do it without a downtime, actually.
> >>
> >> In older threads on this list, snaptrim issues which seemed similar (but
> >> not clearly related to an upgrade) required more heavy operations
> (either
> >> offline compaction or OSD recreation).
> >> Since online compaction is comparatibely "cheap", I'd always try this
> >> first, in my case, each OSD took less than 2-3 minutes for this, but of
> >> course your mileage may vary.
> >>
> >> Cheers,
> >>        Oliver
> >>
> >>> Am 23.08.24 um 17:42 schrieb Özkan Göksu:
> >>> Hello Oliver.
> >>>
> >>> Thank you so much for the answer!
> >>>
> >>> I was thinking of re-creating the OSD's but if you are sure the
> >> compaction is the solution here then it's worth to try.
> >>> I'm planning to shutdown all the VM's and when the cluster is safe then
> >> I will try OSD compaction.
> >>> May I learn did you do online compaction or offline?
> >>>
> >>> Because I have 2 side and I can shutdown 1 entire rack and do the
> >> offline compaction and do the same thing other side when its done.
> >>> What do you think?
> >>>
> >>> Regards.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:
> >> freyermuth@xxxxxxxxxxxxxxxxxx>>, 23 Ağu 2024 Cum, 18:06 tarihinde şunu
> >> yazdı:
> >>>
> >>>    Hi Özkan,
> >>>
> >>>    FWIW, we observed something similar after upgrading from Mimic =>
> >> Nautilus => Octopus and starting to trim snapshots after.
> >>>
> >>>    The size of our cluster was a bit smaller, but the effect was the
> >> same: When snapshot trimming started, OSDs went into high load and RBD
> I/O
> >> was extremely slow.
> >>>
> >>>    We tried to use:
> >>>       ceph tell osd.* injectargs '--osd-snap-trim-sleep 10'
> >>>    first, which helped, but of course snapshots kept piling up.
> >>>
> >>>    Finally, we performed only RocksDB compactions via:
> >>>
> >>>       for A in {0..5}; do ceph tell osd.$A compact | sed 's/^/'$A': /'
> >> & done
> >>>
> >>>    for some batches of OSDs, and their load went down heavily. Finally,
> >> after we'd churned through all OSDs, I/O load was low again, and we
> could
> >> go back to the default:
> >>>       ceph tell osd.* injectargs '--osd-snap-trim-sleep 0'
> >>>
> >>>    After this, the situation has stabilized for us. So my guess would
> >> be that the RocksDBs grew too much after the OMAP format conversion and
> the
> >> compaction shrank them again.
> >>>
> >>>    Maybe that also helps in your case?
> >>>
> >>>    Interestingly, we did not observe this on other clusters (one mainly
> >> for CephFS, another one with mirrored RBD volumes), which took the same
> >> upgrade path.
> >>>
> >>>    Cheers,
> >>>             Oliver
> >>>
> >>>    Am 23.08.24 um 16:46 schrieb Özkan Göksu:
> >>>> Hello folks.
> >>>>
> >>>> We have a ceph cluster and we have 2000+ RBD drives on 20 nodes.
> >>>>
> >>>> We upgraded the cluster from 14.2.16 to 15.2.14 and after the
> >> upgrade we
> >>>> started to see snap trim issues.
> >>>> Without the "nosnaptrim" flag, the system is not usable right now.
> >>>>
> >>>> I think the problem is because of the omap conversion at Octopus
> >> upgrade.
> >>>>
> >>>> Note that the first time each OSD starts, it will do a format
> >> conversion to
> >>>> improve the accounting for “omap” data. This may take a few
> >> minutes to as
> >>>> much as a few hours (for an HDD with lots of omap data). You can
> >> disable
> >>>> this automatic conversion with:
> >>>>
> >>>> What should I do to solve this problem?
> >>>>
> >>>> Thanks.
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:
> >> ceph-users@xxxxxxx>
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:
> >> ceph-users-leave@xxxxxxx>
> >>>
> >>>    --
> >>>    Oliver Freyermuth
> >>>    Universität Bonn
> >>>    Physikalisches Institut, Raum 1.047
> >>>    Nußallee 12
> >>>    53115 Bonn
> >>>    --
> >>>    Tel.: +49 228 73 2367
> >>>    Fax:  +49 228 73 7869
> >>>    --
> >>>
> >>
> >> --
> >> Oliver Freyermuth
> >> Universität Bonn
> >> Physikalisches Institut, Raum 1.047
> >> Nußallee 12
> >> 53115 Bonn
> >> --
> >> Tel.: +49 228 73 2367
> >> Fax:  +49 228 73 7869
> >> --
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx