Hi Casey,
When it comes to "dechnical debt", the main question is what is the ongoing cost of not making this change?
Do we see memory allocation and copy into RGWHTTPArgs as noticeable perf issue? Maybe there is a simpler way to resolve this specific issue?
It looks like the list of things to do to achieve feature parity with libcurl is substantial.
Is there a desire by the beast maintainers to add these capabilities?
Yuval
On Tue, Oct 26, 2021 at 9:34 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
dear Adam and list,
aside from rgw's frontend, which is the server side of http, we also
have plenty of http client code that sends http requests to other
servers. the biggest user of the client is multisite sync, which uses
http to read replication logs and fetch objects from other zones. all
of this http client code is based on libcurl, and uses its 'multi api'
to provide an async interface with a background thread that polls for
completions
it's hard to beat libcurl for stability and features, but there has
also been interest in using asio+beast for the client ever since we
added it to the frontend. benefits there would include a nicer c++
interface, better integration with the asio async model (we do
currently have wrappers for libcurl, but they're specific to
coroutines), and the potential to use custom allocators to avoid most
of the per-request allocations
to help with a comparison against beast, these are the features of
libcurl that we rely on:
- asynchronous using the 'multi api' and a background thread
(https://everything.curl.dev/libcurl/drive/multi)
- connection pooling (see https://everything.curl.dev/libcurl/connectionreuse)
- ssl context and optional certificate verification
- connect/request timeouts
- rate limits
see RGWHTTPClient::init_request() in rgw_http_client.cc for all of the
specific CURLOPT_ features we're using now
also noteworthy is curl's support for http/1.1, http/2, and http/3
(https://everything.curl.dev/libcurl-http/versions)
asio does not have connection pooling or connect timeouts (though it
has the components necessary to build them), and beast only supports
http/1.1. i think everything else in the list is covered:
ssl support comes from boost::asio::ssl and ssl_stream
there's a tcp_stream class
(https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__tcp_stream.html)
that wraps a tcp socket and adds rate limiting and timeouts. we use
that in the frontend, though we're tracking a performance regression
related to its timeouts in https://tracker.ceph.com/issues/52333
there's a very nice http::fields class
(https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__beast__http__fields.html)
for headers that has custom allocator support. there's an
'http_server_fast' example at
https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/http_server_fast.cpp
that uses the custom allocator in
https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/fields_alloc.hpp.
i'd love to see something like that replace our use of map<string,
string> for headers in RGWHTTPArgs during request processing
for connection pooling with asio, i did explore this for a while with
Abhishek in https://github.com/cbodley/nexus/tree/wip-connection-pool/include/nexus/http/connection_pool.hpp.
it had connect timeouts and some test coverage in
https://github.com/cbodley/nexus/blob/wip-connection-pool/test/http/test_connection_pool.cc,
but needs more work. for example, each connection_pool is constructed
with one hostname/port. there also needs to be a map of these pools,
keyed either on hostname/port or resolved address, so we can cache
connections for any url the client requests
i was also imagining higher-level interfaces like http::async_get()
(and head/put/post/etc) that would hide the use of connection pooling
entirely, and use beast's request/response concepts to write the
request and read its response. this is also a good place to implement
retries. i explored this idea in a separate repo here
https://github.com/cbodley/requests/tree/master/include/requests
with asio, we can attach a connection pooling service as an
io_context::service that gets created automatically on first use, and
saved over the lifetime of the io_context. the application would have
the option to configure it, but doesn't have to know anything about it
otherwise
overloading those high-level interfaces could also provide a good
abstraction to support http 2 and 3, where their connection pools
would just have one connection per address, and each request would
open its own stream
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx