Happy New Year Ceph-Users!
With the holidays and people likely being away, I take the liberty to
bluntly BUMP this question about protecting RGW from DoS below:
On 22.12.23 10:24, Christian Rohmann wrote:
Hey Ceph-Users,
RGW does have options [1] to rate limit ops or bandwidth per bucket or
user.
But those only come into play when the request is authenticated.
I'd like to also protect the authentication subsystem from malicious
or invalid requests.
So in case e.g. some EC2 credentials are not valid (anymore) and
clients start hammering the RGW with those requests, I'd like to make
it cheap to deal with those requests. Especially in case some external
authentication like OpenStack Keystone [2] is used, valid access
tokens are cached within the RGW. But requests with invalid
credentials end up being sent at full rate to the external API [3] as
there is no negative caching. And even if there was, that would only
limit the external auth requests for the same set of invalid
credentials, but it would surely reduce the load in that case:
Since the HTTP request is blocking ....
[...]
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 sending request to
https://keystone.example.com/v3/s3tokens
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 register_request
mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 WARNING: blocking http
request
2023-12-18T15:25:55.861+0000 7fede37fe640 20 link_request
req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0
[...]
this does not only stress the external authentication API (keystone in
this case), but also blocks RGW threads for the duration of the
external call.
I am currently looking into using the capabilities of HAProxy to rate
limit requests based on their resulting http-response [4]. So in
essence to rate-limit or tarpit clients that "produce" a high number
of 403 "InvalidAccessKeyId" responses. To have less collateral it
might make sense to limit based on the presented credentials
themselves. But this would require to extract and track HTTP headers
or URL parameters (presigned URLs) [5] and to put them into tables.
* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those
cases itself?
** adding negative caching
** rate limits on concurrent external authentication requests (or is
there a pool of connections for those requests?)
Regards
Christian
[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2]
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone
[3]
https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/src/rgw/rgw_auth_keystone.cc#L475
[4]
https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-http-response%20track-sc0
[5]
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#auth-methods-intro
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx