On Thu, Feb 09, 2017 at 05:11:05PM +0100, Cornelia Huck wrote: > > >>> * Non power-of-2 ring sizes > > >>> > > >>> As the ring simply wraps around, there's no reason to > > >>> require ring size to be power of two. > > >>> It can be made a separate feature though. > > >> > > >> Power of 2 ring sizes are required in order to ignore the high bits of > > >> the indices. With non-power-of-2 sizes you are forced to keep the > > >> indices less than the ring size. > > > > > > Right. So > > > > > > if (unlikely(idx++ > size)) > > > idx = 0; > > > > > > OTOH ring size that's twice larger than necessary > > > because of power of two requirements wastes cache. > > > > I don't know. Power of 2 ring size is pretty standard, I'd rather avoid > > the complication and the gratuitous difference with 1.0. > > I agree. I don't think dropping the power of 2 requirement buys us so > much that it makes up for the added complexity. I recalled why I came up with this. The issue is cache associativity. Recall that besides the ring we have event suppression structures - if we are lucky and things run at the same speed everything can work by polling keeping events disabled, then event suppression structures are never written to, they are read-only. However if ring and event suppression share a cache line ring accesses have a chance to push the event suppression out of cache, causing misses on read. This can happen if they are at the same offset in the set. E.g. with L1 cache 4Kbyte sets are common, so same offset within a 4K page. We can fix this by making event suppression adjacent in memory, e.g.: [interrupt suppress] [descriptor ring] [kick suppress] If this whole structure fits in a single set, ring accesses will not push kick or interrupt suppress out of cache. Specific layout can be left for drivers, but as set size is a power of two this might require a non-power of two ring size. I conclude that this is an optimization that needs to be benchmarked. I also note that the generic description does not have to force powers of two *even if devices actually require it*. I would be inclined to word the text in a way that makes relaxing the restriction easier. For example, we can say "free running 16 bit index" and this forces a power of two, but we can also say "free running index wrapping to 0 after (N*queue-size - 1) with N chosen such that the value fits in 16 bit" and this is exactly the same if queue size is a power of 2. So we can add text saying "ring size MUST be a power of two" and later it will be easy to relax just by adding a feature bit. -- MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization