>>> On 14.03.12 at 07:32, Justin Gibbs <justing@xxxxxxxxxxxxxxxx> wrote: > There's another problem here that I brought up during the Xen > Hack-a-thon. The ring macros require that the ring element count > be a power of two. This doesn't mean that the ring will be a power > of 2 pages in size. To illustrate this point, I modified the FreeBSD > blkback driver to provide negotiated ring stats via sysctl. > > Here's a connection to a Windows VM running the Citrix PV drivers: > > dev.xbbd.2.max_requests: 128 > dev.xbbd.2.max_request_segments: 11 > dev.xbbd.2.max_request_size: 45056 > dev.xbbd.2.ring_elem_size: 108 <= 32bit ABI > dev.xbbd.2.ring_pages: 4 > dev.xbbd.2.ring_elements: 128 > dev.xbbd.2.ring_waste: 2496 > > Over half a page is wasted when ring-page-order is 2. I'm sure you > can see where this is going. :-) > > Here are the limits published by our backend to the XenStore: > > max-ring-pages = "113" > max-ring-page-order = "7" > max-requests = "256" > max-request-segments = "129" > max-request-size = "524288" > > Because we allow so many concurrent, large requests in our product, > the ring wastage really adds up if the front end doesn't support > the "ring-pages" variant of the extension. However, you only need > a ring-page-order of 3 with this protocol to start seeing pages of > wasted ring space. > > You don't really want to negotiate "ring-pages" either. The backends > often need to support multiple ABIs. I can easily construct a set > of limits for the FreeBSD blkback driver which will cause the ring > limits to vary by a page between the 32bit and 64bit ABIs. > > With all this in mind, the backend must do a dance of rounding up, > taking the max of the ring sizes for the different ABIs, and then > validating the front-end published limits taking its ABI into > account. The front-end does some of this too. Its way too messy > and error prone because we don't communicate the ring element limit > directly. > > "max-ring-element-order" anyone? :-) Interesting observation - yes, I think deprecating both pre-existing methods in favor of something along those lines would be desirable. (But I'd favor not using the term "order" here as it is - at least in Linux - usually implied to be used on pages. "max-ringent-log2" perhaps?) What you say also implies that all currently floating around Linux backend patches are flawed in their way of calculating the number of ring entries, as this number really depends on the protocol the frontend advertises. Further, if you're concerned about wasting ring space (and particularly in the context of your request number/size/segments extension), shouldn't we bother to define pairs (or larger groups) of struct blkif_request_segment (as currently a quarter of the space is mere padding)? Or split grefs from {first,last}_sect altogether? Finally, while looking at all this again, I stumbled across the use of blkif_vdev_t in the ring structures: At least Linux'es blkback completely ignores this field - {xen_,}vbd_translate() simply overwrites what dispatch_rw_block_io() put there (and with this, struct phys_req's dev and bdev members seem rather pointless too). Does anyone recall what the original intention with this request field was? Allowing I/O on multiple devices over a single ring? Bottom line - shouldn't we define a blkif2 interface to cleanly accommodate all the various extensions (and do away with the protocol variations)? Jan _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization