Re: rgw: thoughts on the http client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



reviving this old thread about http clients after reading through
https://github.com/RobertLeahy/ASIO-cURL and discovering the
"multi-socket flavor" of the libcurl-multi API documented in
https://curl.se/libcurl/c/libcurl-multi.html

rgw's existing RGWHTTPManager uses the older flavor of libcurl-multi,
which requires a background thread that polls libcurl for io and
completions. this new flavor allows us to do all of the polling and
timers asynchronously with asio, and only call into libcurl for
non-blocking io when the sockets are ready to read/write. getting rid
of the background thread makes it much easier to integrate with asio
applications, because it removes many complications around locking and
object lifetimes

i experimented with this multi-socket API by building my own asio
integration in https://github.com/cbodley/ceph/pull/6. there are two
main reasons i find this especially interesting:

1) we've been doing some prototyping for multisite sync with asio's
c++20 coroutines. RGWHTTPManager only supports the
optional_yield-style coroutines, so we were talking about using beast
for this initial prototype. however, i listed several of beast's
missing features earlier in this thread (mainly timeouts and
connection pooling), so this new curl client could be a much better
fit here

2) curl can be built with HTTP/3 support, and that's what we've been
using to test rgw's prototype frontend in
https://github.com/ceph/ceph/pull/48178. we need a multiplexing client
like libcurl-multi in order to test QUIC's stream multiplexing. and
because the QUIC library depends on BoringSSL, this HTTP/3-enabled
version of curl can't be linked against rgw (which requires OpenSSL)
for RGWHTTPManager

On Thu, Oct 28, 2021 at 12:24 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>
> On Thu, Oct 28, 2021 at 10:41 AM Yuval Lifshitz <ylifshit@xxxxxxxxxx> wrote:
> >
> > Hi Casey,
> > When it comes to "dechnical debt", the main question is what is the ongoing cost of not making this change?
> > Do we see memory allocation and copy into RGWHTTPArgs as noticeable perf issue? Maybe there is a simpler way to resolve this specific issue?
>
> historically, we have seen very bad behavior from tcmalloc at high
> thread counts in rgw, and we've been making general efforts both to
> reduce allocations and the number of threads required. i don't think
> anyone has tried to measure the impact of RGWHTTPArgs itself, but i do
> see it's use of map<string, string> as low hanging fruit. and because
> this piece is on rgw's http server side, replacing this map wouldn't
> require any of the client stuff described above
>
> > It looks like the list of things to do to achieve feature parity with libcurl is substantial.
>
> i agree! i wanted to start by documenting where the gaps are, to help
> us understand the scope of a project here
>
> even without dropping libcurl, i think there's a lot of potential
> cleanup in the several layers (rgw_http_client, rgw_rest_client,
> rgw_rest_conn, rgw_cr_rest) between libcurl and multisite. for
> multisite in general, i would really like to see it adopt similar
> async primitives to the rest of the rgw codebase so that we can share
> more code
>
> > Is there a desire by the beast maintainers to add these capabilities?
>
> beast has generally positioned itself as a low-level http protocol
> library, to serve as building blocks for higher-level client and
> server libraries/applications. the http ecosystem is vast, so it makes
> sense to limit the scope of any individual library. libcurl is
> enormous, yet still only covers the client side
>
> though with the addition of the tcp_stream in boost 1.70
> (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/release_notes.html),
> beast did take a step toward this higher level of abstraction. it's
> definitely worth discussing whether additional features like client
> connection pooling would be in scope for the project. it's also worth
> researching what other asio-compatible http client libraries are out
> there
>
>
> > Yuval
> >
> >
> > On Tue, Oct 26, 2021 at 9:34 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
> >>
> >> dear Adam and list,
> >>
> >> aside from rgw's frontend, which is the server side of http, we also
> >> have plenty of http client code that sends http requests to other
> >> servers. the biggest user of the client is multisite sync, which uses
> >> http to read replication logs and fetch objects from other zones. all
> >> of this http client code is based on libcurl, and uses its 'multi api'
> >> to provide an async interface with a background thread that polls for
> >> completions
> >>
> >> it's hard to beat libcurl for stability and features, but there has
> >> also been interest in using asio+beast for the client ever since we
> >> added it to the frontend. benefits there would include a nicer c++
> >> interface, better integration with the asio async model (we do
> >> currently have wrappers for libcurl, but they're specific to
> >> coroutines), and the potential to use custom allocators to avoid most
> >> of the per-request allocations
> >>
> >>
> >> to help with a comparison against beast, these are the features of
> >> libcurl that we rely on:
> >>
> >> - asynchronous using the 'multi api' and a background thread
> >> (https://everything.curl.dev/libcurl/drive/multi)
> >> - connection pooling (see https://everything.curl.dev/libcurl/connectionreuse)
> >> - ssl context and optional certificate verification
> >> - connect/request timeouts
> >> - rate limits
> >>
> >> see RGWHTTPClient::init_request() in rgw_http_client.cc for all of the
> >> specific CURLOPT_ features we're using now
> >>
> >> also noteworthy is curl's support for http/1.1, http/2, and http/3
> >> (https://everything.curl.dev/libcurl-http/versions)
> >>
> >>
> >> asio does not have connection pooling or connect timeouts (though it
> >> has the components necessary to build them), and beast only supports
> >> http/1.1. i think everything else in the list is covered:
> >>
> >> ssl support comes from boost::asio::ssl and ssl_stream
> >>
> >> there's a tcp_stream class
> >> (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__tcp_stream.html)
> >> that wraps a tcp socket and adds rate limiting and timeouts. we use
> >> that in the frontend, though we're tracking a performance regression
> >> related to its timeouts in https://tracker.ceph.com/issues/52333
> >>
> >> there's a very nice http::fields class
> >> (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__http__fields.html)
> >> for headers that has custom allocator support. there's an
> >> 'http_server_fast' example at
> >> https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/http_server_fast.cpp
> >> that uses the custom allocator in
> >> https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/fields_alloc.hpp.
> >> i'd love to see something like that replace our use of map<string,
> >> string> for headers in RGWHTTPArgs during request processing
> >>
> >>
> >> for connection pooling with asio, i did explore this for a while with
> >> Abhishek in https://github.com/cbodley/nexus/tree/wip-connection-pool/include/nexus/http/connection_pool.hpp.
> >> it had connect timeouts and some test coverage in
> >> https://github.com/cbodley/nexus/blob/wip-connection-pool/test/http/test_connection_pool.cc,
> >> but needs more work. for example, each connection_pool is constructed
> >> with one hostname/port. there also needs to be a map of these pools,
> >> keyed either on hostname/port or resolved address, so we can cache
> >> connections for any url the client requests
> >>
> >> i was also imagining higher-level interfaces like http::async_get()
> >> (and head/put/post/etc) that would hide the use of connection pooling
> >> entirely, and use beast's request/response concepts to write the
> >> request and read its response. this is also a good place to implement
> >> retries. i explored this idea in a separate repo here
> >> https://github.com/cbodley/requests/tree/master/include/requests
> >>
> >> with asio, we can attach a connection pooling service as an
> >> io_context::service that gets created automatically on first use, and
> >> saved over the lifetime of the io_context. the application would have
> >> the option to configure it, but doesn't have to know anything about it
> >> otherwise
> >>
> >> overloading those high-level interfaces could also provide a good
> >> abstraction to support http 2 and 3, where their connection pools
> >> would just have one connection per address, and each request would
> >> open its own stream
> >>
> >> _______________________________________________
> >> Dev mailing list -- dev@xxxxxxx
> >> To unsubscribe send an email to dev-leave@xxxxxxx
> >>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux