On Mon, Mar 25, 2024 at 2:11 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 25.03.24 07:08, Xuan Zhuo wrote: > > On Fri, 22 Mar 2024 22:02:27 +0100, David Hildenbrand <david@xxxxxxxxxx> wrote: > >> On 22.03.24 20:16, Daniel Verkamp wrote: > >>> On Thu, Mar 21, 2024 at 3:16 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> Currently, the init_vqs function within the virtio_balloon driver relies > >>>> on the condition that certain names array entries are null in order to > >>>> skip the initialization of some virtual queues (vqs). This behavior is > >>>> unique to this part of the codebase. In an upcoming commit, we plan to > >>>> eliminate this dependency by removing the function entirely. Therefore, > >>>> with this change, we are ensuring that the virtio_balloon no longer > >>>> depends on the aforementioned function. > >>> > >>> This is a behavior change, and I believe means that the driver no > >>> longer follows the spec [1]. > >>> > >>> For example, the spec says that virtqueue 4 is reporting_vq, and > >>> reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set, > >>> but there is no mention of its virtqueue number changing if other > >>> features are not set. If a device/driver combination negotiates > >>> VIRTIO_BALLOON_F_PAGE_REPORTING but not VIRTIO_BALLOON_F_STATS_VQ or > >>> VIRTIO_BALLOON_F_FREE_PAGE_HINT, my reading of the specification is > >>> that reporting_vq should still be vq number 4, and vq 2 and 3 should > >>> be unused. This patch would make the reporting_vq use vq 2 instead in > >>> this case. > >>> > >>> If the new behavior is truly intended, then the spec does not match > >>> reality, and it would need to be changed first (IMO); however, > >>> changing the spec would mean that any devices implemented correctly > >>> per the previous spec would now be wrong, so some kind of mechanism > >>> for detecting the new behavior would be warranted, e.g. a new > >>> non-device-specific virtio feature flag. > >>> > >>> I have brought this up previously on the virtio-comment list [2], but > >>> it did not receive any satisfying answers at that time. > >> > >> Rings a bell, but staring at this patch, I thought that there would be > >> no behavioral change. Maybe I missed it :/ > >> > >> I stared at virtio_ccw_find_vqs(), and it contains: > >> > >> for (i = 0; i < nvqs; ++i) { > >> if (!names[i]) { > >> vqs[i] = NULL; > >> continue; > >> } > >> > >> vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, callbacks[i], > >> names[i], ctx ? ctx[i] : false, > >> ccw); > >> if (IS_ERR(vqs[i])) { > >> ret = PTR_ERR(vqs[i]); > >> vqs[i] = NULL; > >> goto out; > >> } > >> } > >> > >> We increment queue_idx only if an entry was not NULL. SO I thought no > >> behavioral change? (at least on s390x :) ) > >> > >> It's late here in Germany, so maybe I'm missing something. > > > > I think we've encountered a tricky issue. Currently, all transports handle queue > > id by incrementing them in order, without skipping any queue id. So, I'm quite > > surprised that my changes would affect the spec. The fact that the > > 'names' value is null is just a small trick in the Linux kernel implementation > > and should not have an impact on the queue id. > > > > I believe that my recent modification will not affect the spec. So, let's > > consider the issues with this patch set separately for now. Regarding the Memory > > Balloon Device, it has been operational for many years, and perhaps we should > > add to the spec that if a certain vq does not exist, then subsequent vqs will > > take over its id. > > Right, if I am not missing something your patch should have no > functional change in that regard (that the current > behavior/implementation might not match the spec is a different discussion). > > @Daniel, if I'm missing something, please shout. Thanks for digging into that - I think you're correct in that the patch does not change the behavior, due to changes elsewhere in the generic virtio and virtio-pci code. So in that sense, I guess this should not block this particular patch. It would be good to have the spec situation cleared up, though - I guess in practice, all relevant drivers and device implementations are already following the model where there are no gaps in the queue numbering, rather than what the spec seems to indicate. Thanks, -- Daniel