Re: rgw: coroutine stack protection

Casey Bodley <cbodley@xxxxxxxxxx> · Mon, 11 Nov 2019 17:09:39 -0500

On 11/10/19 4:34 AM, Yuval Lifshitz wrote:
On Tue, Nov 5, 2019 at 10:33 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
Gal was able to track down some memory bugs with address sanitizer and
identify them as overflows of the coroutine stack used by the beast
frontend. Unlike pthread stacks, these heap-allocated coroutine stacks
don't have any memory protection to catch these overflows.

The boost::asio::spawn() [1] function that creates these coroutines does
allow you to pass in coroutine attributes to control the stack size
(which defaults to 128K), and we were able to increase that in testing
and see the errors go away. But since these coroutines are how we expect
the beast frontend to scale to thousands of connections, we really don't
want to raise the stack size permanently and limit how many we can fit
in memory.
what is the required stack size that does not cause in an overflow?

As far as I know, 1MB is all we tested (via 
https://github.com/cbodley/ceph/commit/d23507bd1295a29ccae3ae36187194d44c9a6438) 
just to confirm that stack overflow was the root cause.

Unfortunately, the boost::coroutine library used by boost::asio doesn't
give us any flexibility outside of the stack size, though the underlying
boost::context library provides several different options for the stack
allocation [2]. And despite boost::coroutine being long deprecated by
boost::coroutine2, boost::asio has never removed its dependency on the
former. The maintainer of these coroutine libraries even submitted a
pull request against boost::asio in 2017 to use boost::context directly
[3], allowing boost::asio::spawn() to use any of its stack allocators.
My sense is that the asio maintainer is more interested in integration
with the C++ Coroutines TS, rather than continuing to support these
stackful coroutines that aren't part of the C++ Networking TS.

I rebased this stale pull request and added some test coverage in
https://github.com/cbodley/asio/commits/wip-spawn-context. Then, since
spawn() is relatively independent from the rest of boost::asio, I forked
that part into a new repo at https://github.com/cbodley/spawn. Using
this fork, radosgw could continue using boost::asio as it is (and
eventually the Networking TS), but call spawn::spawn() instead for
direct control over the coroutine stack allocation.

Of the available stack allocators, the segmented_stack looked promising;
that would allow us to start with a small stack, and grow as more space
was needed. However, this would require us to build our own boost as
segmented stacks aren't enabled by default. It also doesn't help us
control the total stack size used per connection.
if this is rare (hard to reproduce), why don't we used pooled_fixedsize_stack?
this would limit the overall size, but would allow the specific code
path to use more memory?

The pooled_fixedsize_stack appears to have the same 2 limitations as the 
current fixedsize_stack: each stack has a fixed size of 128K by default, 
and there's no memory protection to detect stack overflows. The only 
difference seems to be that freed stacks are kept in a boost::pool to be 
reused for later stack allocations.

So the consensus is to use the protected_fixedsize allocator based on
mmap/mprotect with the existing 128K limit. That way we can catch these
stack overflows in testing, and treat the stack usage itself as bugs.
why would this considered as bug (assuming this does not exceed the
2MB you usually get per regular stack)?

The motivation for enforcing a limit on the per-connection stack size is 
to help scale to lots of concurrent connections. However, if the 
requests are mostly object gets/puts, we'll be buffering up to 16M of 
object data per request and that's going to dwarf these stacks.

Another dimension to consider here is the zipper project, which will 
allow you to insert extra layers into the request processing path to add 
things like caching, logging, policies, etc. So even if we do set a 
strict limit on these stacks and verify that we don't hit them in 
testing, all bets are off when you start adding in layers at runtime.

I think the protected_fixedsize_stack gives us the most flexibility 
here. In addition to the overflow detection from mmap+mprotect, it lets 
us map a comfortable amount of memory without having to allocate pages 
for them up front, and essentially behaving like the pthread stacks 
we're used to.

Casey

[1]
https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/reference/spawn.html

[2]
https://www.boost.org/doc/libs/1_71_0/libs/context/doc/html/context/stack.html

[3] https://github.com/boostorg/asio/pull/55
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx