Re: Snaptrim issue after nautilus to octopus upgrade

Boris <bb@xxxxxxxxx> · Fri, 23 Aug 2024 18:24:40 +0200

I tried it with the offline compactation, and it didn't help a bit. 

It took ages per OSD and starting the OSD afterwards wasn't fast either. 

> Am 23.08.2024 um 18:16 schrieb Özkan Göksu <ozkangksu@xxxxxxxxx>:
> 
> I have 12+12 = 24 servers with 8 x 4TB SAS SSD on each node.
> I will use the weekend and I will start compaction on 12 servers on
> Saturday and 12 others on Sunday and when the compaction is complete I will
> unset nosnaptrim and let the cluster clean the 2 weeks of snaps leftover.
> 
> Thank you for the advice, I will share the results when it's done.
> 
> Regards.
> 
> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>, 23 Ağu 2024 Cum, 18:48
> tarihinde şunu yazdı:
> 
>> Hi Özkan,
>> 
>> in our case, we tried online compaction first, and it helped to resolve
>> the issue completely. I did first test with a single OSD daemon (i.e. only
>> online compaction of that single OSD), and checked that the load of that
>> daemon went down significantly
>> (that was while snaptrims with high sleep value were still going on).
>> Then, I went in batches of 10 % of the cluster's OSDs, and they finished
>> rather fast (few minutes) so I could do it without a downtime, actually.
>> 
>> In older threads on this list, snaptrim issues which seemed similar (but
>> not clearly related to an upgrade) required more heavy operations (either
>> offline compaction or OSD recreation).
>> Since online compaction is comparatibely "cheap", I'd always try this
>> first, in my case, each OSD took less than 2-3 minutes for this, but of
>> course your mileage may vary.
>> 
>> Cheers,
>>        Oliver
>> 
>>> Am 23.08.24 um 17:42 schrieb Özkan Göksu:
>>> Hello Oliver.
>>> 
>>> Thank you so much for the answer!
>>> 
>>> I was thinking of re-creating the OSD's but if you are sure the
>> compaction is the solution here then it's worth to try.
>>> I'm planning to shutdown all the VM's and when the cluster is safe then
>> I will try OSD compaction.
>>> May I learn did you do online compaction or offline?
>>> 
>>> Because I have 2 side and I can shutdown 1 entire rack and do the
>> offline compaction and do the same thing other side when its done.
>>> What do you think?
>>> 
>>> Regards.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:
>> freyermuth@xxxxxxxxxxxxxxxxxx>>, 23 Ağu 2024 Cum, 18:06 tarihinde şunu
>> yazdı:
>>> 
>>>    Hi Özkan,
>>> 
>>>    FWIW, we observed something similar after upgrading from Mimic =>
>> Nautilus => Octopus and starting to trim snapshots after.
>>> 
>>>    The size of our cluster was a bit smaller, but the effect was the
>> same: When snapshot trimming started, OSDs went into high load and RBD I/O
>> was extremely slow.
>>> 
>>>    We tried to use:
>>>       ceph tell osd.* injectargs '--osd-snap-trim-sleep 10'
>>>    first, which helped, but of course snapshots kept piling up.
>>> 
>>>    Finally, we performed only RocksDB compactions via:
>>> 
>>>       for A in {0..5}; do ceph tell osd.$A compact | sed 's/^/'$A': /'
>> & done
>>> 
>>>    for some batches of OSDs, and their load went down heavily. Finally,
>> after we'd churned through all OSDs, I/O load was low again, and we could
>> go back to the default:
>>>       ceph tell osd.* injectargs '--osd-snap-trim-sleep 0'
>>> 
>>>    After this, the situation has stabilized for us. So my guess would
>> be that the RocksDBs grew too much after the OMAP format conversion and the
>> compaction shrank them again.
>>> 
>>>    Maybe that also helps in your case?
>>> 
>>>    Interestingly, we did not observe this on other clusters (one mainly
>> for CephFS, another one with mirrored RBD volumes), which took the same
>> upgrade path.
>>> 
>>>    Cheers,
>>>             Oliver
>>> 
>>>    Am 23.08.24 um 16:46 schrieb Özkan Göksu:
>>>> Hello folks.
>>>> 
>>>> We have a ceph cluster and we have 2000+ RBD drives on 20 nodes.
>>>> 
>>>> We upgraded the cluster from 14.2.16 to 15.2.14 and after the
>> upgrade we
>>>> started to see snap trim issues.
>>>> Without the "nosnaptrim" flag, the system is not usable right now.
>>>> 
>>>> I think the problem is because of the omap conversion at Octopus
>> upgrade.
>>>> 
>>>> Note that the first time each OSD starts, it will do a format
>> conversion to
>>>> improve the accounting for “omap” data. This may take a few
>> minutes to as
>>>> much as a few hours (for an HDD with lots of omap data). You can
>> disable
>>>> this automatic conversion with:
>>>> 
>>>> What should I do to solve this problem?
>>>> 
>>>> Thanks.
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:
>> ceph-users@xxxxxxx>
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:
>> ceph-users-leave@xxxxxxx>
>>> 
>>>    --
>>>    Oliver Freyermuth
>>>    Universität Bonn
>>>    Physikalisches Institut, Raum 1.047
>>>    Nußallee 12
>>>    53115 Bonn
>>>    --
>>>    Tel.: +49 228 73 2367
>>>    Fax:  +49 228 73 7869
>>>    --
>>> 
>> 
>> --
>> Oliver Freyermuth
>> Universität Bonn
>> Physikalisches Institut, Raum 1.047
>> Nußallee 12
>> 53115 Bonn
>> --
>> Tel.: +49 228 73 2367
>> Fax:  +49 228 73 7869
>> --
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx