Re: [virtio-comment] virtio queue numbering and optional queues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.08.23 15:40, Stefan Hajnoczi wrote:
On Mon, Aug 21, 2023 at 03:18:50PM -0700, Daniel Verkamp wrote:
Hello virtio folks,

Hi Daniel,
I have CCed those involved in the free page hint and page reporting
features.

Stefan


I noticed a mismatch between the way the specification defines
device-specific virtqueue indexes and the way device and driver
implementers have interpreted the specification. As a practical example,
consider the traditional memory balloon device [1]. The first two queues
(indexes 0 and 1) are available as part of the baseline device, but the
rest of the queues are tied to feature bits.

Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from
queue index to queue name/function, defining queue index 3 as free_page_vq
and index 4 as reporting_vq, and declaring that "free_page_vq only exists
if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if
VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I
assume "is set" means "is negotiated" (not just "advertised by the
device").

Staring at QEMU: the queues are added when the virtio-balloon device is
*created*. Queues are added based on feature configuration, if they are
part of the device feature set.

That should translate to "is advertised", not "is negotiated".

The queue ordering is as follows:

* inflate queue, baseline device
* deflate queue, baseline device
* driver memory statistics, VIRTIO_BALLOON_F_STATS_VQ
* free page hinting, VIRTIO_BALLOON_F_FREE_PAGE_HINT
* free page reporting, VIRTIO_BALLOON_F_REPORTING

QEMU always supports the first 3, so they use number 0-2. The other two
can be configured for the device.

So the queue indices vary based on actual feature presence.

Also presumably "exists" means something like "may only be used
by the driver if the feature bit is negotiated" and "should be ignored by
the device if the feature bit is not negotiated", although it would be nice
to have a proper definition in the spec somewhere.

Section 5.5.3, "Feature bits", gives definitions of the feature bits, with
similar descriptions of the relationship between the feature bits and
virtqueue availability, although the wording is slightly different
("present" rather than "exists"). No dependency between feature bits is
defined, so it seems like it should be valid for a device or driver to
support or accept one of the higher-numbered features while not supporting
a lower-numbered one.

Yes, that's my understanding.



Notably, there is no mention of queue index assignments changing based on
negotiated features in either of these sections. Hence a reader can only
assume that the queue index assignments are fixed (i.e. stats_vq will
always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other
feature bits).

And that does not seem to be the case. At least QEMU assigns them sequentially,
based on actually configured features for the device.

If I read the kernel code correctly (drivers/virtio/virtio_balloon.c:init_vqs)
it also behaves that way: if the device has a certain feature
"virtio_has_feature", it gets the next index. Otherwise, the next index goes
to another feature:

	/*
	 * Inflateq and deflateq are used unconditionally. The names[]
	 * will be NULL if the related feature is not enabled, which will
	 * cause no allocation for the corresponding virtqueue in find_vqs.
	 */
	callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
	names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
	callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
	names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
	callbacks[VIRTIO_BALLOON_VQ_STATS] = NULL;
	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
	callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
	names[VIRTIO_BALLOON_VQ_REPORTING] = NULL;


	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
		callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
	}

	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
		names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
	}

	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) {
		names[VIRTIO_BALLOON_VQ_REPORTING] = "reporting_vq";
		callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack;
	}

	err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
			      callbacks, names, NULL);
	if (err)
		return err;


Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and
VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but
VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of
the defined features but the driver only wants to use reporting_vq, not
free_page_vq). In this case, what queue index should be used by the driver
when enabling reporting_vq? My reading of the specification is that the
reporting_vq is always queue index 4, independent of whether
VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are
negotiated, but this contradicts existing device and driver
implementations, which will use queue index 3 (the next one after stats_vq
= 2) as reporting_vq in this case.

Then the specification really needs updating :)


The qemu virtio-ballon device [2] assigns the next-highest unused queue
index when calling virtio_add_queue(), and in the scenario presented above,
free_page_vq will not be added since F_STATS_VQ is not negotiated, so
reporting_vq will be assigned queue index 3, rather than 4. (Additionally,
qemu always adds the stats_vq regardless of negotiated features, but that's
irrelevant in this case since we are assuming the STATS_VQ feature is
negotiated.)

The Linux virtio driver code originally seemed to use the correct (by my
reading) indexes, but it was changed to match the layout used by qemu in a
2019 commit ("virtio_pci: use queue idx instead of array idx to set up the
vq") [3] - in other words, it will now also expect queue index 3 to be
reporting_vq in the scenario laid out above.

Note that at the time of this commit, there was no support for "free page reporting".

        callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
        names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
        callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
        names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
        names[VIRTIO_BALLOON_VQ_STATS] = NULL;
        names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;

        if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
                names[VIRTIO_BALLOON_VQ_STATS] = "stats";
                callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
        }

        if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
                names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
                callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
        }

And as QEMU always sets VIRTIO_BALLOON_F_STATS_VQ, that one always gets id=2.

Consequently, VIRTIO_BALLOON_F_FREE_PAGE_HINT, if around, gets id=3.

As we didn't support VIRTIO_BALLOON_F_REPORTING, it doesn't matter which id it gets.

But as you note, once we have different implementations and more feature variability,
it's a mess.

A device that implements VIRTIO_BALLOON_F_FREE_PAGE_HINT but not VIRTIO_BALLOON_F_STATS_VQ
might not work correctly with either old or new QEMU.

Maybe it needs to be documented that any device that implements either
VIRTIO_BALLOON_F_FREE_PAGE_HINT or VIRTIO_BALLOON_F_REPORTING *must* also implement
VIRTIO_BALLOON_F_STATS_VQ, so old+new Linux drivers would continue working.


I'm not sure how to resolve the mismatch between the specification and
actual implementation behavior. The simplest change would probably be to
rewrite the specification to drop the explicit queue indexes in section
5.5.2 and add some wording about how queues are numbered based on
negotiated feature bits (this would need to be applied to other device

Yes.

types that have specified queue indexes as well). However, this would also
technically be an incompatible change of the specification. On the other
hand, changing the device and driver implementations to match the
specification would be even more challenging, since it would be an
incompatible change in actual practice, not just a change of the spec to
match consensus implementation behavior.

Changing drivers/devices is pretty much impossible.

So we should document the queue assignment better, and maybe the implication
of requiring VIRTIO_BALLOON_F_STATS_VQ when any new features are implemented.

Does that make sense?



Perhaps drivers could add a quirk to detect old versions of the qemu device
and use the old behavior, while enabling the correct behavior only for
other device vendors and newer qemu device revisions, and the qemu device
could add an opt-in feature to enable the correct behavior that users would
need to enable only when they know they have a sufficiently new driver with
the fix.


Or maybe there could be a new feature bit that would opt into following the
spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to
require devices to use the old behavior when that bit is not negotiated,
but that also feels less than ideal to me.

Any thoughts on how to proceed with this situation? Is my reading of the
specification just wrong?

I think you raised an important point. We should try documenting reality in
the specification.

--
Cheers,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux