Hi, Can you check the read and write latency of your osds? Maybe it hangs because it’s waiting for pg’s but maybe the pg are under scrub or something else. Also with many small objects don’t rely on pg autoscaler, it might not tell to increase pg but maybe it should be. Istvan Szabo Staff Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx> --------------------------------------------------- On 2023. Jun 23., at 19:12, Rok Jaklič <rjaklic@xxxxxxxxx> wrote: Email received from the internet. If in doubt, don't click any link nor open any attachment ! ________________________________ We are experiencing something similar (slow GETs responses) when sending 1k delete requests for example in ceph v16.2.13. Rok On Mon, Jun 12, 2023 at 7:16 PM grin <grin@xxxxxxx> wrote: Hello, ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) There is a single (test) radosgw serving plenty of test traffic. When under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty reliably hangs: low traffic threads seem to work (like handling occasional PUTs) but GETs are completely nonresponsive, all attention seems to be spent on futexes. The effect is extremely similar to https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections (subject: Radosgw (civetweb) hangs once around) except this is quincy so it's beast instead of civetweb. The effect is the same as described there, except the cluster is way smaller (about 20-40 OSDs). I observed that when I start radosgw -f with debug 20/20 it almost never hangs, so my guess is some ugly race condition. However I am a bit clueless how to actually debug it since debugging makes it go away. Debug 1 (default) with -d seems to hang after a while but it's not that simple to induce, I'm still testing under 4/4. Also I do not see much to configure about beast. As to answer the question in the original (2016) thread: - Debian stable - no visible limits issue - no obvious memory leak observed - no other visible resource shortage - strace says everyone's waiting on futexes, about 600-800 threads, apart from the one serving occasional PUTs - tcp port doesn't respond. IRC didn't react. ;-) Thanks, Peter _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx