rgw: coroutine stack protection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gal was able to track down some memory bugs with address sanitizer and identify them as overflows of the coroutine stack used by the beast frontend. Unlike pthread stacks, these heap-allocated coroutine stacks don't have any memory protection to catch these overflows.

The boost::asio::spawn() [1] function that creates these coroutines does allow you to pass in coroutine attributes to control the stack size (which defaults to 128K), and we were able to increase that in testing and see the errors go away. But since these coroutines are how we expect the beast frontend to scale to thousands of connections, we really don't want to raise the stack size permanently and limit how many we can fit in memory.

Unfortunately, the boost::coroutine library used by boost::asio doesn't give us any flexibility outside of the stack size, though the underlying boost::context library provides several different options for the stack allocation [2]. And despite boost::coroutine being long deprecated by boost::coroutine2, boost::asio has never removed its dependency on the former. The maintainer of these coroutine libraries even submitted a pull request against boost::asio in 2017 to use boost::context directly [3], allowing boost::asio::spawn() to use any of its stack allocators. My sense is that the asio maintainer is more interested in integration with the C++ Coroutines TS, rather than continuing to support these stackful coroutines that aren't part of the C++ Networking TS.

I rebased this stale pull request and added some test coverage in https://github.com/cbodley/asio/commits/wip-spawn-context. Then, since spawn() is relatively independent from the rest of boost::asio, I forked that part into a new repo at https://github.com/cbodley/spawn. Using this fork, radosgw could continue using boost::asio as it is (and eventually the Networking TS), but call spawn::spawn() instead for direct control over the coroutine stack allocation.

Of the available stack allocators, the segmented_stack looked promising; that would allow us to start with a small stack, and grow as more space was needed. However, this would require us to build our own boost as segmented stacks aren't enabled by default. It also doesn't help us control the total stack size used per connection.

So the consensus is to use the protected_fixedsize allocator based on mmap/mprotect with the existing 128K limit. That way we can catch these stack overflows in testing, and treat the stack usage itself as bugs.

Casey


[1] https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/reference/spawn.html

[2] https://www.boost.org/doc/libs/1_71_0/libs/context/doc/html/context/stack.html

[3] https://github.com/boostorg/asio/pull/55
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux