Restricting scope to RADOS ops doesn't appear to address the broader motivations for the scheduler, I think. cf Kyle's mail. Matt On Thu, Mar 22, 2018 at 7:26 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote: > On Thu, Mar 22, 2018 at 5:17 PM, Yehuda Sadeh-Weinraub > <ysadehwe@xxxxxxxxxx> wrote: >> On Thu, Mar 22, 2018 at 12:09 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote: >>> One of the benefits of the asynchronous beast frontend in radosgw is that it >>> allows us to do things like request throttling and priority queuing that >>> would otherwise block frontend threads - which are a scarce resource in >>> civetweb's thread-per-connection model. >>> >>> The primary goal of this project is to prevent large object data workloads >>> from starving out cheaper requests. After some discussion in the Ann Arbor >>> office, our resident dmclock expert Eric Ivancich convinced us that mclock >>> was a good fit. I've spent the week exploring a design for this, and wanted >>> to get some early feedback: >>> >>> Each HTTP request would be assigned a request class (dmclock calls them >>> clients) and a cost. >>> >>> The four initial request classes: >>> - auth: requests for swift auth tokens, and eventually sts >>> - admin: admin APIs for use by the dashboard and multisite sync >>> - data: object io >>> - metadata: everything else, such as bucket operations, object stat, etc. >>> >>> Calculating a cost is difficult, especially for the two major cases where >>> we'd want it: object GET requests (because we have to check with RADOS >>> before we know its actual size), and object PUT requests that use chunked >>> transfer-encoding. I'd love to hear ideas for this, but for now I think it's >>> good enough to assign everything a cost of 1 so that all of the units are in >>> requests/sec. I believe this is what the osd is doing now as well? >>> >> >> That does sound like the simpler solution that should be good enough >> starting point. What if we could integrate it in a much lower layer, >> e.g., into librados? > > So a queue for outgoing osd ops instead of http requests? That could > be interesting. It would certainly better capture the cost for reads > and writes. cls stuff might be harder to model. I worry about putting > a queue so close to Objecter's throttles, though - maybe this would > work best inside the Objecter as a replacement to the throttles? > > I think we'd still need something at a higher level though, to prevent > us from reading in a ton of data from PUT requests before blocking to > write it out to rados. > >> >>> New virtual functions in class RGWOp seem like a good way for the derived >>> Ops to return their request class and cost. Once we know those, we can add >>> ourselves to the mclock priority queue and do an async wait until its our >>> turn to run. >>> >>> But where exactly does this step fit into the request processing pipeline? >>> Does it happen before or after authentication/authorization? I'm leaning >>> towards after, so that auth failures get filtered out before they enter the >>> queue. >> >> What about the situation where you have a bad actor flooding with >> badly authenticated requests? > > Yeah, good point. Filtering anything out just means that mclock can't > do its job to provide fairness for the remaining requests. > >> >>> >>> The priority queue can use perf counters for introspection, and a config >>> observer to apply changes to the per-client mclock options. >>> >>> As future work, we could add some load balancer integration to: >>> - enable custom scripts that look at incoming requests and assign their own >>> request class/cost >>> - track distributed client stats across gateways, and feed that info back >>> into radosgw with each request (this is the d in dmclock) >>> >>> Thanks, >>> Casey >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html