Re: mclock priority queue in radosgw

Kyle Bader <kyle.bader@xxxxxxxxx> · Fri, 23 Mar 2018 14:08:54 -0700

> On Thu, Mar 22, 2018 at 7:40 PM, Kyle Bader <kyle.bader@xxxxxxxxx> wrote:
>> From a capacity planning perspective, it would be fantastic to be able
>> to limit the request volume per bucket. In Amazon S3, they provide
>> roughly 300 PUT/LIST/DELETE per second or 800 GET per second. Taking
>> those values and translating them into sensible default weight seems
>> like a good start. The ability to scale the limits as the bucket is
>> sharded would further enhance fidelity with Amazon's behavior. When
>
> Okay, I could see that working with two request classes for each
> bucket instead of just data+metadata. I'm not sure how well the
> priority queue itself will handle a large number of different clients,
> but I could do some microbenchmarks to see.
>
> Aside from the ability to set limits, dmclock also supports
> reservations and weighting for fairness. Do you think those features
> are as interesting as the limits, on a per-bucket dimension?

If you mean:

* 10% s3://foo
* 10% s3://bar
* 10% s3://baz
* 20% s3://boo
* 50% s3://far

No, I don't think that is very useful. I could see weighting the
underlying pools that map to different storage policies as being
useful. For example:

90% default.buckets.data.standard
10% default.buckets.data.glacier

> If not, maybe per-bucket limits (and per-user, as Robin points out
> later in the thread) would work better as separate steps underneath
> the dmclock priority queue. Done separately, it would be easier to
> optimize for a large number of buckets/users, and could support
> configuring different limits for specific buckets/users.
>
> On the other hand, if reservations and weighting between buckets is
> important, should those values scale with shards as well? As the
> number of bucket shards in the cluster grows, the other non-bucket
> request classes (admin and auth) would get a smaller proportion and
> need adjusting.

I think a bucket with more shards, and a higher limit will see
increased authentication. For example there might be a stampede of
spark tasks authenticating to access a common bucket.

>> you exceed the number of requests per second in Amazon, you get a 503:
>> "Slow down" error, we should probably do similar. All these things go
>
> Agreed! It makes sense to return 503 once you reach the limit, instead
> of queuing. The dmclock priority queue doesn't support this now, but
> I'm guessing that it could be made to.
>
> That would mean civetweb could take advantage of this too, which would
> be wonderful.

The AWS SDKs have an exponential backup, so throwing a 503 is way
better than waiting on a timeout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html