Hello virtio folks,
I noticed a mismatch between the way the specification defines device-specific virtqueue indexes and the way device and driver implementers have interpreted the specification. As a practical example, consider the traditional memory balloon device [1]. The first two queues (indexes 0 and 1) are available as part of the baseline device, but the rest of the queues are tied to feature bits.
Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from queue index to queue name/function, defining queue index 3 as free_page_vq and index 4 as reporting_vq, and declaring that "free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I assume "is set" means "is negotiated" (not just "advertised by the device"). Also presumably "exists" means something like "may only be used by the driver if the feature bit is negotiated" and "should be ignored by the device if the feature bit is not negotiated", although it would be nice to have a proper definition in the spec somewhere.
Section 5.5.3, "Feature bits", gives definitions of the feature bits, with similar descriptions of the relationship between the feature bits and virtqueue availability, although the wording is slightly different ("present" rather than "exists"). No dependency between feature bits is defined, so it seems like it should be valid for a device or driver to support or accept one of the higher-numbered features while not supporting a lower-numbered one.
Notably, there is no mention of queue index assignments changing based on negotiated features in either of these sections. Hence a reader can only assume that the queue index assignments are fixed (i.e. stats_vq will always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other feature bits).
Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of the defined features but the driver only wants to use reporting_vq, not free_page_vq). In this case, what queue index should be used by the driver when enabling reporting_vq? My reading of the specification is that the reporting_vq is always queue index 4, independent of whether VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are negotiated, but this contradicts existing device and driver implementations, which will use queue index 3 (the next one after stats_vq = 2) as reporting_vq in this case.
The qemu virtio-ballon device [2] assigns the next-highest unused queue index when calling virtio_add_queue(), and in the scenario presented above, free_page_vq will not be added since F_STATS_VQ is not negotiated, so reporting_vq will be assigned queue index 3, rather than 4. (Additionally, qemu always adds the stats_vq regardless of negotiated features, but that's irrelevant in this case since we are assuming the STATS_VQ feature is negotiated.)
The Linux virtio driver code originally seemed to use the correct (by my reading) indexes, but it was changed to match the layout used by qemu in a 2019 commit ("virtio_pci: use queue idx instead of array idx to set up the vq") [3] - in other words, it will now also expect queue index 3 to be reporting_vq in the scenario laid out above.
I'm not sure how to resolve the mismatch between the specification and actual implementation behavior. The simplest change would probably be to rewrite the specification to drop the explicit queue indexes in section 5.5.2 and add some wording about how queues are numbered based on negotiated feature bits (this would need to be applied to other device types that have specified queue indexes as well). However, this would also technically be an incompatible change of the specification. On the other hand, changing the device and driver implementations to match the specification would be even more challenging, since it would be an incompatible change in actual practice, not just a change of the spec to match consensus implementation behavior.
Perhaps drivers could add a quirk to detect old versions of the qemu device and use the old behavior, while enabling the correct behavior only for other device vendors and newer qemu device revisions, and the qemu device could add an opt-in feature to enable the correct behavior that users would need to enable only when they know they have a sufficiently new driver with the fix.
Or maybe there could be a new feature bit that would opt into following the spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to require devices to use the old behavior when that bit is not negotiated, but that also feels less than ideal to me.
Any thoughts on how to proceed with this situation? Is my reading of the specification just wrong?
Thanks,
-- Daniel
[1]: https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-3160002
[3]: https://github.com/torvalds/linux/commit/ddbeac07a39a81d82331a312d0578fab94fccbf1
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization