[cc:qemu-devel] On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote: >> On Thu, 30 Sep 2021 07:12:21 -0400 >> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote: >> > > This patch fixes a regression introduced by commit 82e89ea077b9 >> > > ("virtio-blk: Add validation for block size in config space") and >> > > enables similar checks in verify() on big endian platforms. >> > > >> > > The problem with checking multi-byte config fields in the verify >> > > callback, on big endian platforms, and with a possibly transitional >> > > device is the following. The verify() callback is called between >> > > config->get_features() and virtio_finalize_features(). That we have a >> > > device that offered F_VERSION_1 then we have the following options >> > > either the device is transitional, and then it has to present the legacy >> > > interface, i.e. a big endian config space until F_VERSION_1 is >> > > negotiated, or we have a non-transitional device, which makes >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and >> > > thus presents a little endian config space. Because at this point we >> > > can't know if the device is transitional or non-transitional, we can't >> > > know do we need to byte swap or not. >> > >> > Hmm which transport does this refer to? >> >> It is the same with virtio-ccw and virtio-pci. I see the same problem >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional >> yet (have to figure out how to do that with libvirt) for pci I used >> virtio-blk-pci. >> >> > Distinguishing between legacy and modern drivers is transport >> > specific. PCI presents >> > legacy and modern at separate addresses so distinguishing >> > between these two should be no trouble. >> >> You mean the device id? Yes that is bolted down in the spec, but >> currently we don't exploit that information. Furthermore there >> is a fat chance that with QEMU even the allegedly non-transitional >> devices only present a little endian config space after VERSION_1 >> was negotiated. Namely get_config for virtio-blk is implemented in >> virtio_blk_update_config() which does virtio_stl_p(vdev, >> &blkcfg.blk_size, blk_size) and in there we don't care >> about transitional or not: >> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) >> { >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) >> return virtio_is_big_endian(vdev); >> #elif defined(TARGET_WORDS_BIGENDIAN) >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ >> return false; >> } >> return true; >> #else >> return false; >> #endif >> } >> > > ok so that's a QEMU bug. Any virtio 1.0 and up > compatible device must use LE. > It can also present a legacy config space where the > endian depends on the guest. So, how is the virtio core supposed to determine this? A transport-specific callback? > >> > Channel i/o has versioning so same thing? >> > >> >> Don't think so. Both a transitional and a non-transitional device >> would have to accept revisions higher than 0 if the driver tried to >> negotiate those (and we do in our case). > > Yes, the modern driver does. And that one is known to be LE. > legacy driver doesn't. > >> > > The virtio spec explicitly states that the driver MAY read config >> > > between reading and writing the features so saying that first accessing >> > > the config before feature negotiation is done is not an option. The >> > > specification ain't clear about setting the features multiple times >> > > before FEATURES_OK, so I guess that should be fine. >> > > >> > > I don't consider this patch super clean, but frankly I don't think we >> > > have a ton of options. Another option that may or man not be cleaner, >> > > but is also IMHO much uglier is to figure out whether the device is >> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding >> > > according tho what we have figured out, hoping that the characteristics >> > > of the device didn't change. >> > >> > I am confused here. So is the problem at the device or at the driver level? >> >> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add >> validation for block size in config space") virtio-blk is broken on >> s390. > > Because of a qemu bug. I agree. It's worth working around in the driver > since the qemu bug has been around for a very long time. Yes, since we introduced virtio 1 support, I guess... > > >> The deeper problem is in the spec. We stated that the driver may read >> config space before the feature negotiation is finalized, but we didn't >> think enough about what happens when native endiannes is not little >> endian in the different cases. > > Because the spec is very clear that endian-ness is LE. > I don't see a spec issue yet here, just an implementation issue. Maybe not really a bug in the spec, but probably an issue, as this seems to have been unclear to most people so far. > >> I believe, for non-transitional devices we have a problem in the host as >> well (i.e. in QEMU). > > Because QEMU ignores the spec and instead relies on the feature > negotiation. > >> >> > I suspect it's actually the host that has the issue, not >> > the guest? >> >> I tend to say we have a problem both in the host and in the guest. I'm >> more concerned about the problem in the guest, because that is a really >> nasty regression. > > The problem is in the guest. The bug is in the host ;) > >> For the host. I think for legacy we don't have a >> problem, because both sides would operate on the assumption no >> _F_VERSION_1, IMHO the implementation for the transitional devices is >> correct. > > Well no, the point of transitional is really to be 1.0 compliant > *and* also expose a legacy interface. Worth noting that PCI and CCW are a tad different here: PCI exposes an additional interface, while CCW uses a revision negotiation mechanism (for CCW, legacy and standard-compliant are much closer on the transport side as for PCI.) MMIO does not do transitional, if I'm not wrong. > >> For non-transitional flavor, it depends on the device. For >> example virtio-net and virtio-blk is broken, because we use primitives >> like virtio_stl_p() and those don't do the right thing before feature >> negotiation is completed. On the other hand virtio-crypto.c as a truly >> non-transitional device uses stl_le_p() and IMHO does the right thing. >> >> Thanks for your comments! I hope I managed to answer your questions. I >> need some guidance on how do we want to move forward on this. >> >> Regards, >> Halil > > OK so. I don't have a problem with the patch itself, > assuming it's enough to work around all buggy hosts. > I am especially worried about things like vhost/vhost-user, > I suspect they might have a bug like this too, and > I am not sure whether your work around is enough for these. > Can you check please? > > If not we'll have to move all validate code to after FEATURES_OK > is set. What is supposed to happen for validate after FEATURES_OK? The driver cannot change any features at that point in time, it can only fail to use the device. > > We do however want to document that this API can be called > multiple times since that was not the case > previously. > > Also, I would limit this to when > - the validate callback exists > - the guest endian-ness is not LE > > We also want to document the QEMU bug in a comment here, > e.g. > > /* > * QEMU before version 6.2 incorrectly uses driver features with guest > * endian-ness to set endian-ness for config space instead of just using > * LE for the modern interface as per spec. > * This breaks reading config in the validate callback. > * To work around that, when device is 1.0 (so supposed to be LE) > * but guest is not LE, then send the features to device one extra > * time before validation. > */ Do we need to consider migration, or do we not need to be bug-compatible in this case? > > Finally I'd like to see the QEMU bug fix before I merge this one, > since it will be harder to test with a fix. > > > > >> > >> > >> > > Signed-off-by: Halil Pasic <pasic@xxxxxxxxxxxxx> >> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") >> > > Reported-by: markver@xxxxxxxxxx >> > > --- >> > > drivers/virtio/virtio.c | 4 ++++ >> > > 1 file changed, 4 insertions(+) >> > > >> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c >> > > index 0a5b54034d4b..9dc3cfa17b1c 100644 >> > > --- a/drivers/virtio/virtio.c >> > > +++ b/drivers/virtio/virtio.c >> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d) >> > > if (device_features & (1ULL << i)) >> > > __virtio_set_bit(dev, i); >> > > >> > > + /* Write back features before validate to know endianness */ >> > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) >> > > + dev->config->finalize_features(dev); >> > > + >> > > if (drv->validate) { >> > > err = drv->validate(dev); >> > > if (err) >> > > >> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb >> > > -- >> > > 2.25.1 >> >