Re: slow operation observed for _collection_list

Boris Behrens <bb@xxxxxxxxx> · Thu, 11 Nov 2021 09:42:18 +0100

Hi,
are you sure this can be "solved" via offline compactation?
I had a crashed OSD yesterday which was added to the cluster a couple hours
ago and it was still in den process of syncing in.

@Igor, did you manage to fix the problem or find a workaround.

Am Do., 11. Nov. 2021 um 09:23 Uhr schrieb Szabo, Istvan (Agoda) <
Istvan.Szabo@xxxxxxxxx>:

> Yeah, tried it, osd just crashed after couple of hours.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
> ---------------------------------------------------
>
> On 2021. Nov 11., at 0:16, Сергей Процун <prosergey07@xxxxxxxxx> wrote:
>
> 
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
> No, you can not do online compaction.
>
> пт, 5 лист. 2021, 17:22 користувач Szabo, Istvan (Agoda) <
> Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>> пише:
> Seems like it can help, but after 1-2 days it comes back on different and
> in some cases on the same osd as well.
> Is there any other way to compact online as it compacts offline?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> From: Szabo, Istvan (Agoda)
> Sent: Friday, October 29, 2021 8:43 PM
> To: Igor Fedotov <igor.fedotov@xxxxxxxx<mailto:igor.fedotov@xxxxxxxx>>
> Cc: Ceph Users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
> Subject: Re:  slow operation observed for _collection_list
>
> I can give a try again, but before migrated all db back to data I did
> compaction on all osd.
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
>
> On 2021. Oct 29., at 15:02, Igor Fedotov <igor.fedotov@xxxxxxxx<mailto:
> igor.fedotov@xxxxxxxx><mailto:igor.fedotov@xxxxxxxx<mailto:
> igor.fedotov@xxxxxxxx>>> wrote:
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
>
> Please manually compact the DB using ceph-kvstore-tool for all the
> affected OSDs (or preferable every OSD in the cluster). Highly likely
> you're facing RocksDB performance degradation caused by prior bulk data
> removal. Setting bluefs_buffered_io to true (if not yet set) might be
> helpful as well.
>
>
> On 10/29/2021 3:22 PM, Szabo, Istvan (Agoda) wrote:
>
> Hi,
>
> Having slow ops and laggy pgs due to osd is not accessible (octopus
> 15.2.14 version and 15.2.10 also).
> At the time when slow ops started, in the osd log I can see:
>
> "7f2a8d68f700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
> 0x7f2a70de5700' had timed out after 15"
>
> And this blocks the io until the radosgateway didn't restart itself.
> Is this a bug or something else?
>
> In the ceph.log I can see also that specific osd is reported failed from
> another osds:
>
> 2021-10-29T05:49:34.386857+0700 mon.server-3s01 (mon.0) 3576376 : cluster
> [DBG] osd.7 reported failed by osd.31
> 2021-10-29T05:49:34.454037+0700 mon.server-3s01 (mon.0) 3576377 : cluster
> [DBG] osd.7 reported failed by osd.22
> 2021-10-29T05:49:34.666758+0700 mon.server-3s01 (mon.0) 3576379 : cluster
> [DBG] osd.7 reported failed by osd.6
> 2021-10-29T05:49:34.807714+0700 mon.server-3s01 (mon.0) 3576382 : cluster
> [DBG] osd.7 reported failed by osd.11
>
> Here is the osd log: https://justpaste.it/4x4h2
> Here is the ceph.log itself: https://justpaste.it/5bk8k
> Here is some additional information regarding memory usage and
> backtrace...: https://justpaste.it/1tmjg
>
> Thank you
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx
> ><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>>
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx