hey Matt, On Tue, Feb 28, 2023 at 1:10 PM Matt Benjamin <mbenjami@xxxxxxxxxx> wrote: > > Removing the dependency on libcurl was one of the things I hoped to get out of the refactoring. can you expand on your objections to libcurl? there's some messy http client code in rgw, but i wouldn't necessarily attribute that to libcurl. i feel like the only thing it's really missing, as a C library, is support for custom allocators. even so, i don't know that we've ever shown libcurl to be a bottleneck anywhere. i've also been contributing to its aws sigv4 support in https://github.com/curl/curl/pull/9995, which could allow us to remove our own custom client-side signing code what would you replace it with? i haven't done a recent review of c++ libraries in this space, but i don't think beast will ever solve all of these problems for us. for one, the author never expressed an interest in supporting HTTP/2 or 3. during my last interaction in https://github.com/boostorg/beast/pull/2334#issuecomment-952122694, he suggested that beast was a first draft and that he would rather start over on a new library (now at https://github.com/CPPAlliance/http_proto, which still only covers HTTP/1.1) > > Matt > > On Tue, Feb 28, 2023 at 12:57 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: >> >> reviving this old thread about http clients after reading through >> https://github.com/RobertLeahy/ASIO-cURL and discovering the >> "multi-socket flavor" of the libcurl-multi API documented in >> https://curl.se/libcurl/c/libcurl-multi.html >> >> rgw's existing RGWHTTPManager uses the older flavor of libcurl-multi, >> which requires a background thread that polls libcurl for io and >> completions. this new flavor allows us to do all of the polling and >> timers asynchronously with asio, and only call into libcurl for >> non-blocking io when the sockets are ready to read/write. getting rid >> of the background thread makes it much easier to integrate with asio >> applications, because it removes many complications around locking and >> object lifetimes >> >> i experimented with this multi-socket API by building my own asio >> integration in https://github.com/cbodley/ceph/pull/6. there are two >> main reasons i find this especially interesting: >> >> 1) we've been doing some prototyping for multisite sync with asio's >> c++20 coroutines. RGWHTTPManager only supports the >> optional_yield-style coroutines, so we were talking about using beast >> for this initial prototype. however, i listed several of beast's >> missing features earlier in this thread (mainly timeouts and >> connection pooling), so this new curl client could be a much better >> fit here >> >> 2) curl can be built with HTTP/3 support, and that's what we've been >> using to test rgw's prototype frontend in >> https://github.com/ceph/ceph/pull/48178. we need a multiplexing client >> like libcurl-multi in order to test QUIC's stream multiplexing. and >> because the QUIC library depends on BoringSSL, this HTTP/3-enabled >> version of curl can't be linked against rgw (which requires OpenSSL) >> for RGWHTTPManager >> >> On Thu, Oct 28, 2021 at 12:24 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: >> > >> > On Thu, Oct 28, 2021 at 10:41 AM Yuval Lifshitz <ylifshit@xxxxxxxxxx> wrote: >> > > >> > > Hi Casey, >> > > When it comes to "dechnical debt", the main question is what is the ongoing cost of not making this change? >> > > Do we see memory allocation and copy into RGWHTTPArgs as noticeable perf issue? Maybe there is a simpler way to resolve this specific issue? >> > >> > historically, we have seen very bad behavior from tcmalloc at high >> > thread counts in rgw, and we've been making general efforts both to >> > reduce allocations and the number of threads required. i don't think >> > anyone has tried to measure the impact of RGWHTTPArgs itself, but i do >> > see it's use of map<string, string> as low hanging fruit. and because >> > this piece is on rgw's http server side, replacing this map wouldn't >> > require any of the client stuff described above >> > >> > > It looks like the list of things to do to achieve feature parity with libcurl is substantial. >> > >> > i agree! i wanted to start by documenting where the gaps are, to help >> > us understand the scope of a project here >> > >> > even without dropping libcurl, i think there's a lot of potential >> > cleanup in the several layers (rgw_http_client, rgw_rest_client, >> > rgw_rest_conn, rgw_cr_rest) between libcurl and multisite. for >> > multisite in general, i would really like to see it adopt similar >> > async primitives to the rest of the rgw codebase so that we can share >> > more code >> > >> > > Is there a desire by the beast maintainers to add these capabilities? >> > >> > beast has generally positioned itself as a low-level http protocol >> > library, to serve as building blocks for higher-level client and >> > server libraries/applications. the http ecosystem is vast, so it makes >> > sense to limit the scope of any individual library. libcurl is >> > enormous, yet still only covers the client side >> > >> > though with the addition of the tcp_stream in boost 1.70 >> > (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/release_notes.html), >> > beast did take a step toward this higher level of abstraction. it's >> > definitely worth discussing whether additional features like client >> > connection pooling would be in scope for the project. it's also worth >> > researching what other asio-compatible http client libraries are out >> > there >> > >> > >> > > Yuval >> > > >> > > >> > > On Tue, Oct 26, 2021 at 9:34 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: >> > >> >> > >> dear Adam and list, >> > >> >> > >> aside from rgw's frontend, which is the server side of http, we also >> > >> have plenty of http client code that sends http requests to other >> > >> servers. the biggest user of the client is multisite sync, which uses >> > >> http to read replication logs and fetch objects from other zones. all >> > >> of this http client code is based on libcurl, and uses its 'multi api' >> > >> to provide an async interface with a background thread that polls for >> > >> completions >> > >> >> > >> it's hard to beat libcurl for stability and features, but there has >> > >> also been interest in using asio+beast for the client ever since we >> > >> added it to the frontend. benefits there would include a nicer c++ >> > >> interface, better integration with the asio async model (we do >> > >> currently have wrappers for libcurl, but they're specific to >> > >> coroutines), and the potential to use custom allocators to avoid most >> > >> of the per-request allocations >> > >> >> > >> >> > >> to help with a comparison against beast, these are the features of >> > >> libcurl that we rely on: >> > >> >> > >> - asynchronous using the 'multi api' and a background thread >> > >> (https://everything.curl.dev/libcurl/drive/multi) >> > >> - connection pooling (see https://everything.curl.dev/libcurl/connectionreuse) >> > >> - ssl context and optional certificate verification >> > >> - connect/request timeouts >> > >> - rate limits >> > >> >> > >> see RGWHTTPClient::init_request() in rgw_http_client.cc for all of the >> > >> specific CURLOPT_ features we're using now >> > >> >> > >> also noteworthy is curl's support for http/1.1, http/2, and http/3 >> > >> (https://everything.curl.dev/libcurl-http/versions) >> > >> >> > >> >> > >> asio does not have connection pooling or connect timeouts (though it >> > >> has the components necessary to build them), and beast only supports >> > >> http/1.1. i think everything else in the list is covered: >> > >> >> > >> ssl support comes from boost::asio::ssl and ssl_stream >> > >> >> > >> there's a tcp_stream class >> > >> (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__tcp_stream.html) >> > >> that wraps a tcp socket and adds rate limiting and timeouts. we use >> > >> that in the frontend, though we're tracking a performance regression >> > >> related to its timeouts in https://tracker.ceph.com/issues/52333 >> > >> >> > >> there's a very nice http::fields class >> > >> (https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__http__fields.html) >> > >> for headers that has custom allocator support. there's an >> > >> 'http_server_fast' example at >> > >> https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/http_server_fast.cpp >> > >> that uses the custom allocator in >> > >> https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/fields_alloc.hpp. >> > >> i'd love to see something like that replace our use of map<string, >> > >> string> for headers in RGWHTTPArgs during request processing >> > >> >> > >> >> > >> for connection pooling with asio, i did explore this for a while with >> > >> Abhishek in https://github.com/cbodley/nexus/tree/wip-connection-pool/include/nexus/http/connection_pool.hpp. >> > >> it had connect timeouts and some test coverage in >> > >> https://github.com/cbodley/nexus/blob/wip-connection-pool/test/http/test_connection_pool.cc, >> > >> but needs more work. for example, each connection_pool is constructed >> > >> with one hostname/port. there also needs to be a map of these pools, >> > >> keyed either on hostname/port or resolved address, so we can cache >> > >> connections for any url the client requests >> > >> >> > >> i was also imagining higher-level interfaces like http::async_get() >> > >> (and head/put/post/etc) that would hide the use of connection pooling >> > >> entirely, and use beast's request/response concepts to write the >> > >> request and read its response. this is also a good place to implement >> > >> retries. i explored this idea in a separate repo here >> > >> https://github.com/cbodley/requests/tree/master/include/requests >> > >> >> > >> with asio, we can attach a connection pooling service as an >> > >> io_context::service that gets created automatically on first use, and >> > >> saved over the lifetime of the io_context. the application would have >> > >> the option to configure it, but doesn't have to know anything about it >> > >> otherwise >> > >> >> > >> overloading those high-level interfaces could also provide a good >> > >> abstraction to support http 2 and 3, where their connection pools >> > >> would just have one connection per address, and each request would >> > >> open its own stream >> > >> >> > >> _______________________________________________ >> > >> Dev mailing list -- dev@xxxxxxx >> > >> To unsubscribe send an email to dev-leave@xxxxxxx >> > >> >> _______________________________________________ >> Dev mailing list -- dev@xxxxxxx >> To unsubscribe send an email to dev-leave@xxxxxxx > > > > -- > > Matt Benjamin > Red Hat, Inc. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://www.redhat.com/en/technologies/storage > > tel. 734-821-5101 > fax. 734-769-8938 > cel. 734-216-5309 _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx