radosgw hang under pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

There is a single (test) radosgw serving plenty of test traffic. When under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty reliably hangs: low traffic threads seem to work (like handling occasional PUTs) but GETs are completely nonresponsive, all attention seems to be spent on futexes.

The effect is extremely similar to 
https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections (subject: Radosgw (civetweb) hangs once around)
except this is quincy so it's beast instead of civetweb. The effect is the same as described there, except the cluster is way smaller (about 20-40 OSDs).

I observed that when I start radosgw -f with debug 20/20 it almost never hangs, so my guess is some ugly race condition. However I am a bit clueless how to actually debug it since debugging makes it go away. Debug 1 (default) with -d seems to hang after a while but it's not that simple to induce, I'm still testing under 4/4.

Also I do not see much to configure about beast.

As to answer the question in the original (2016) thread:
- Debian stable
- no visible limits issue
- no obvious memory leak observed
- no other visible resource shortage
- strace says everyone's waiting on futexes, about 600-800 threads, apart from the one serving occasional PUTs
- tcp port doesn't respond.

IRC didn't react. ;-)

Thanks,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux