mclock priority queue in radosgw

Casey Bodley <cbodley@xxxxxxxxxx> · Thu, 22 Mar 2018 15:09:20 -0400

One of the benefits of the asynchronous beast frontend in radosgw is 
that it allows us to do things like request throttling and priority 
queuing that would otherwise block frontend threads - which are a scarce 
resource in civetweb's thread-per-connection model.

The primary goal of this project is to prevent large object data 
workloads from starving out cheaper requests. After some discussion in 
the Ann Arbor office, our resident dmclock expert Eric Ivancich 
convinced us that mclock was a good fit. I've spent the week exploring a 
design for this, and wanted to get some early feedback:

Each HTTP request would be assigned a request class (dmclock calls them 
clients) and a cost.

The four initial request classes:
- auth: requests for swift auth tokens, and eventually sts
- admin: admin APIs for use by the dashboard and multisite sync
- data: object io
- metadata: everything else, such as bucket operations, object stat, etc.

Calculating a cost is difficult, especially for the two major cases 
where we'd want it: object GET requests (because we have to check with 
RADOS before we know its actual size), and object PUT requests that use 
chunked transfer-encoding. I'd love to hear ideas for this, but for now 
I think it's good enough to assign everything a cost of 1 so that all of 
the units are in requests/sec. I believe this is what the osd is doing 
now as well?

New virtual functions in class RGWOp seem like a good way for the 
derived Ops to return their request class and cost. Once we know those, 
we can add ourselves to the mclock priority queue and do an async wait 
until its our turn to run.

But where exactly does this step fit into the request processing 
pipeline? Does it happen before or after authentication/authorization? 
I'm leaning towards after, so that auth failures get filtered out before 
they enter the queue.

The priority queue can use perf counters for introspection, and a config 
observer to apply changes to the per-client mclock options.

As future work, we could add some load balancer integration to:
- enable custom scripts that look at incoming requests and assign their 
own request class/cost
- track distributed client stats across gateways, and feed that info 
back into radosgw with each request (this is the d in dmclock)

Thanks,
Casey
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html