RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hey Ceph-Users,

RGW does have options [1] to rate limit ops or bandwidth per bucket or user.
But those only come into play when the request is authenticated.

I'd like to also protect the authentication subsystem from malicious or invalid requests. So in case e.g. some EC2 credentials are not valid (anymore) and clients start hammering the RGW with those requests, I'd like to make it cheap to deal with those requests. Especially in case some external authentication like OpenStack Keystone [2] is used, valid access tokens are cached within the RGW. But requests with invalid credentials end up being sent at full rate to the external API [3] as there is no negative caching. And even if there was, that would only limit the external auth requests for the same set of invalid credentials, but it would surely reduce the load in that case:

Since the HTTP request is blocking  ....

2023-12-18T15:25:55.861+0000 7fec91dbb640 20 sending request to https://keystone.example.com/v3/s3tokens 2023-12-18T15:25:55.861+0000 7fec91dbb640 20 register_request mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0 2023-12-18T15:25:55.861+0000 7fec91dbb640 20 WARNING: blocking http request 2023-12-18T15:25:55.861+0000 7fede37fe640 20 link_request req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0

this does not only stress the external authentication API (keystone in this case), but also blocks RGW threads for the duration of the external call.

I am currently looking into using the capabilities of HAProxy to rate limit requests based on their resulting http-response [4]. So in essence to rate-limit or tarpit clients that "produce" a high number of 403 "InvalidAccessKeyId" responses. To have less collateral it might make sense to limit based on the presented credentials themselves. But this would require to extract and track HTTP headers or URL parameters (presigned URLs) [5] and to put them into tables.

* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those cases itself?
** adding negative caching
** rate limits on concurrent external authentication requests (or is there a pool of connections for those requests?)



[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2] https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone [3] https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/src/rgw/rgw_auth_keystone.cc#L475 [4] https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-http-response%20track-sc0 [5] https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#auth-methods-intro
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux