Re: mclock priority queue in radosgw

Kyle Bader <kyle.bader@xxxxxxxxx> · Thu, 22 Mar 2018 16:40:08 -0700



>From a capacity planning perspective, it would be fantastic to be able
to limit the request volume per bucket. In Amazon S3, they provide
roughly 300 PUT/LIST/DELETE per second or 800 GET per second. Taking
those values and translating them into sensible default weight seems
like a good start. The ability to scale the limits as the bucket is
sharded would further enhance fidelity with Amazon's behavior. When
you exceed the number of requests per second in Amazon, you get a 503:
"Slow down" error, we should probably do similar. All these things go
a long way in protecting the system from being abused as a k/v store,
misguided tenants can't sap the seeks from folks who are using the
system for appropriately sized objects.

https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/ErrorBestPractices.html

On Thu, Mar 22, 2018 at 3:09 PM, Abhishek <abhishek@xxxxxxxx> wrote:
> On 2018-03-22 22:17, Yehuda Sadeh-Weinraub wrote:
>>
>> On Thu, Mar 22, 2018 at 12:09 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>>>
>>> One of the benefits of the asynchronous beast frontend in radosgw is that
>>> it
>>> allows us to do things like request throttling and priority queuing that
>>> would otherwise block frontend threads - which are a scarce resource in
>>> civetweb's thread-per-connection model.
>>>
>>> The primary goal of this project is to prevent large object data
>>> workloads
>>> from starving out cheaper requests. After some discussion in the Ann
>>> Arbor
>>> office, our resident dmclock expert Eric Ivancich convinced us that
>>> mclock
>>> was a good fit. I've spent the week exploring a design for this, and
>>> wanted
>>> to get some early feedback:
>>>
>>> Each HTTP request would be assigned a request class (dmclock calls them
>>> clients) and a cost.
>>>
>>> The four initial request classes:
>>> - auth: requests for swift auth tokens, and eventually sts
>>> - admin: admin APIs for use by the dashboard and multisite sync
>>> - data: object io
>>> - metadata: everything else, such as bucket operations, object stat, etc.
>>>
>>> Calculating a cost is difficult, especially for the two major cases where
>>> we'd want it: object GET requests (because we have to check with RADOS
>>> before we know its actual size), and object PUT requests that use chunked
>>> transfer-encoding. I'd love to hear ideas for this, but for now I think
>>> it's
>>> good enough to assign everything a cost of 1 so that all of the units are
>>> in
>>> requests/sec. I believe this is what the osd is doing now as well?
>>>
>>
>> That does sound like the simpler solution that should be good enough
>> starting point. What if we could integrate it in a much lower layer,
>> e.g., into librados?
>>
>>> New virtual functions in class RGWOp seem like a good way for the derived
>>> Ops to return their request class and cost. Once we know those, we can
>>> add
>>> ourselves to the mclock priority queue and do an async wait until its our
>>> turn to run.
>>>
>>> But where exactly does this step fit into the request processing
>>> pipeline?
>>> Does it happen before or after authentication/authorization? I'm leaning
>>> towards after, so that auth failures get filtered out before they enter
>>> the
>>> queue.
>>
>>
>> What about the situation where you have a bad actor flooding with
>> badly authenticated requests?
>
>
> For non admin requests, maybe we could use the user parameter to
> start increasing the cost associated with the user as more requests start to
> pile up (though this isn't strictly affected by before/after authentication
> as we
> populate the user info before that anyway)
>
>>>
>>> The priority queue can use perf counters for introspection, and a config
>>> observer to apply changes to the per-client mclock options.
>>>
>>> As future work, we could add some load balancer integration to:
>>> - enable custom scripts that look at incoming requests and assign their
>>> own
>>> request class/cost
>>> - track distributed client stats across gateways, and feed that info back
>>> into radosgw with each request (this is the d in dmclock)
>>>
>>> Thanks,
>>> Casey
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html