Hey Ceph-Users,
RGW does have options [1] to rate limit ops or bandwidth per bucket or user.
But those only come into play when the request is authenticated.
I'd like to also protect the authentication subsystem from malicious or
invalid requests.
So in case e.g. some EC2 credentials are not valid (anymore) and clients
start hammering the RGW with those requests, I'd like to make it cheap
to deal with those requests. Especially in case some external
authentication like OpenStack Keystone [2] is used, valid access tokens
are cached within the RGW. But requests with invalid credentials end up
being sent at full rate to the external API [3] as there is no negative
caching. And even if there was, that would only limit the external auth
requests for the same set of invalid credentials, but it would surely
reduce the load in that case:
Since the HTTP request is blocking ....
[...]
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 sending request to
https://keystone.example.com/v3/s3tokens
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 register_request
mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0
2023-12-18T15:25:55.861+0000 7fec91dbb640 20 WARNING: blocking http
request
2023-12-18T15:25:55.861+0000 7fede37fe640 20 link_request
req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0
[...]
this does not only stress the external authentication API (keystone in
this case), but also blocks RGW threads for the duration of the external
call.
I am currently looking into using the capabilities of HAProxy to rate
limit requests based on their resulting http-response [4]. So in essence
to rate-limit or tarpit clients that "produce" a high number of 403
"InvalidAccessKeyId" responses. To have less collateral it might make
sense to limit based on the presented credentials themselves. But this
would require to extract and track HTTP headers or URL parameters
(presigned URLs) [5] and to put them into tables.
* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those
cases itself?
** adding negative caching
** rate limits on concurrent external authentication requests (or is
there a pool of connections for those requests?)
Regards
Christian
[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2]
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone
[3]
https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/src/rgw/rgw_auth_keystone.cc#L475
[4]
https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-http-response%20track-sc0
[5]
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#auth-methods-intro
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx