Re: radosgw hang under pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Can you check the read and write latency of your osds?
Maybe it hangs because it’s waiting for pg’s but maybe the pg are under scrub or something else.
Also with many small objects don’t rely on pg autoscaler, it might not tell to increase pg but maybe it should be.

Istvan Szabo
Staff Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2023. Jun 23., at 19:12, Rok Jaklič <rjaklic@xxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

We are experiencing something similar (slow GETs responses) when sending 1k
delete requests for example in ceph v16.2.13.

Rok

On Mon, Jun 12, 2023 at 7:16 PM grin <grin@xxxxxxx> wrote:

Hello,

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)

There is a single (test) radosgw serving plenty of test traffic. When
under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty
reliably hangs: low traffic threads seem to work (like handling occasional
PUTs) but GETs are completely nonresponsive, all attention seems to be
spent on futexes.

The effect is extremely similar to

https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections
(subject: Radosgw (civetweb) hangs once around)
except this is quincy so it's beast instead of civetweb. The effect is the
same as described there, except the cluster is way smaller (about 20-40
OSDs).

I observed that when I start radosgw -f with debug 20/20 it almost never
hangs, so my guess is some ugly race condition. However I am a bit clueless
how to actually debug it since debugging makes it go away. Debug 1
(default) with -d seems to hang after a while but it's not that simple to
induce, I'm still testing under 4/4.

Also I do not see much to configure about beast.

As to answer the question in the original (2016) thread:
- Debian stable
- no visible limits issue
- no obvious memory leak observed
- no other visible resource shortage
- strace says everyone's waiting on futexes, about 600-800 threads, apart
from the one serving occasional PUTs
- tcp port doesn't respond.

IRC didn't react. ;-)

Thanks,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux