Re: radosgw hang under pressure

Rok Jaklič <rjaklic@xxxxxxxxx> · Mon, 26 Jun 2023 11:04:24 +0200

From
https://swamireddy.wordpress.com/2019/10/23/ceph-sharding-the-rados-gateway-bucket-index/

*Since the index is stored in a single RADOS object, only a single
operation can be done on it at any given time. When the number of objects
increases, the index stored in the RADOS object grows. Since a single index
is handling a large number of objects, and there is a chance the number of
operations also increases, parallelism is not possible which can end up
being a bottleneck. Multiple operations will need to wait in a queue since
a single operation is possible at a time.*

Maybe you know, is this still the case? Article is from 2019.

On Sun, Jun 25, 2023 at 6:22 PM Szabo, Istvan (Agoda) <
Istvan.Szabo@xxxxxxxxx> wrote:

> Hi,
>
> Can you check the read and write latency of your osds?
> Maybe it hangs because it’s waiting for pg’s but maybe the pg are under
> scrub or something else.
> Also with many small objects don’t rely on pg autoscaler, it might not
> tell to increase pg but maybe it should be.
>
> Istvan Szabo
> Staff Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx
> ---------------------------------------------------
>
> On 2023. Jun 23., at 19:12, Rok Jaklič <rjaklic@xxxxxxxxx> wrote:
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
>
> We are experiencing something similar (slow GETs responses) when sending 1k
> delete requests for example in ceph v16.2.13.
>
> Rok
>
> On Mon, Jun 12, 2023 at 7:16 PM grin <grin@xxxxxxx> wrote:
>
> Hello,
>
>
> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
>
> (stable)
>
>
> There is a single (test) radosgw serving plenty of test traffic. When
>
> under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty
>
> reliably hangs: low traffic threads seem to work (like handling occasional
>
> PUTs) but GETs are completely nonresponsive, all attention seems to be
>
> spent on futexes.
>
>
> The effect is extremely similar to
>
>
>
> https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections
>
> (subject: Radosgw (civetweb) hangs once around)
>
> except this is quincy so it's beast instead of civetweb. The effect is the
>
> same as described there, except the cluster is way smaller (about 20-40
>
> OSDs).
>
>
> I observed that when I start radosgw -f with debug 20/20 it almost never
>
> hangs, so my guess is some ugly race condition. However I am a bit clueless
>
> how to actually debug it since debugging makes it go away. Debug 1
>
> (default) with -d seems to hang after a while but it's not that simple to
>
> induce, I'm still testing under 4/4.
>
>
> Also I do not see much to configure about beast.
>
>
> As to answer the question in the original (2016) thread:
>
> - Debian stable
>
> - no visible limits issue
>
> - no obvious memory leak observed
>
> - no other visible resource shortage
>
> - strace says everyone's waiting on futexes, about 600-800 threads, apart
>
> from the one serving occasional PUTs
>
> - tcp port doesn't respond.
>
>
> IRC didn't react. ;-)
>
>
> Thanks,
>
> Peter
>
> _______________________________________________
>
> ceph-users mailing list -- ceph-users@xxxxxxx
>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> ------------------------------
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx