Hi, are you sure this can be "solved" via offline compactation? I had a crashed OSD yesterday which was added to the cluster a couple hours ago and it was still in den process of syncing in. @Igor, did you manage to fix the problem or find a workaround. Am Do., 11. Nov. 2021 um 09:23 Uhr schrieb Szabo, Istvan (Agoda) < Istvan.Szabo@xxxxxxxxx>: > Yeah, tried it, osd just crashed after couple of hours. > > Istvan Szabo > Senior Infrastructure Engineer > --------------------------------------------------- > Agoda Services Co., Ltd. > e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx> > --------------------------------------------------- > > On 2021. Nov 11., at 0:16, Сергей Процун <prosergey07@xxxxxxxxx> wrote: > > > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > ________________________________ > No, you can not do online compaction. > > пт, 5 лист. 2021, 17:22 користувач Szabo, Istvan (Agoda) < > Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>> пише: > Seems like it can help, but after 1-2 days it comes back on different and > in some cases on the same osd as well. > Is there any other way to compact online as it compacts offline? > > Istvan Szabo > Senior Infrastructure Engineer > --------------------------------------------------- > Agoda Services Co., Ltd. > e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto: > istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>> > --------------------------------------------------- > > From: Szabo, Istvan (Agoda) > Sent: Friday, October 29, 2021 8:43 PM > To: Igor Fedotov <igor.fedotov@xxxxxxxx<mailto:igor.fedotov@xxxxxxxx>> > Cc: Ceph Users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> > Subject: Re: slow operation observed for _collection_list > > I can give a try again, but before migrated all db back to data I did > compaction on all osd. > Istvan Szabo > Senior Infrastructure Engineer > --------------------------------------------------- > Agoda Services Co., Ltd. > e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto: > istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>> > --------------------------------------------------- > > > On 2021. Oct 29., at 15:02, Igor Fedotov <igor.fedotov@xxxxxxxx<mailto: > igor.fedotov@xxxxxxxx><mailto:igor.fedotov@xxxxxxxx<mailto: > igor.fedotov@xxxxxxxx>>> wrote: > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > ________________________________ > > Please manually compact the DB using ceph-kvstore-tool for all the > affected OSDs (or preferable every OSD in the cluster). Highly likely > you're facing RocksDB performance degradation caused by prior bulk data > removal. Setting bluefs_buffered_io to true (if not yet set) might be > helpful as well. > > > On 10/29/2021 3:22 PM, Szabo, Istvan (Agoda) wrote: > > Hi, > > Having slow ops and laggy pgs due to osd is not accessible (octopus > 15.2.14 version and 15.2.10 also). > At the time when slow ops started, in the osd log I can see: > > "7f2a8d68f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f2a70de5700' had timed out after 15" > > And this blocks the io until the radosgateway didn't restart itself. > Is this a bug or something else? > > In the ceph.log I can see also that specific osd is reported failed from > another osds: > > 2021-10-29T05:49:34.386857+0700 mon.server-3s01 (mon.0) 3576376 : cluster > [DBG] osd.7 reported failed by osd.31 > 2021-10-29T05:49:34.454037+0700 mon.server-3s01 (mon.0) 3576377 : cluster > [DBG] osd.7 reported failed by osd.22 > 2021-10-29T05:49:34.666758+0700 mon.server-3s01 (mon.0) 3576379 : cluster > [DBG] osd.7 reported failed by osd.6 > 2021-10-29T05:49:34.807714+0700 mon.server-3s01 (mon.0) 3576382 : cluster > [DBG] osd.7 reported failed by osd.11 > > Here is the osd log: https://justpaste.it/4x4h2 > Here is the ceph.log itself: https://justpaste.it/5bk8k > Here is some additional information regarding memory usage and > backtrace...: https://justpaste.it/1tmjg > > Thank you > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx > ><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx>> > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx