On Fri, Dec 17, 2021 at 9:08 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > > > On 12/15/2021 7:43 PM, Jason Wang wrote: > > On Thu, Dec 16, 2021 at 4:52 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >> > >> > >> On 12/14/2021 6:06 PM, Jason Wang wrote: > >>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>>> > >>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote: > >>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote: > >>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote: > >>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote: > >>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion > >>>>>>>> it ended up with. I have the following questions, > >>>>>>>> > >>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the > >>>>>>>> support will be completely dropped from the table, is my understanding > >>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86, > >>>>>>>> which is backed by the spec at > >>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure > >>>>>>>> if there's request/need to support wilder legacy virtio versions earlier > >>>>>>>> beyond. > >>>>>>> I personally feel it's less work to add in kernel than try to > >>>>>>> work around it in userspace. Jason feels differently. > >>>>>>> Maybe post the patches and this will prove to Jason it's not > >>>>>>> too terrible? > >>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring > >>>>>> layout level and is limited to x86 only, there should be easy way out. > >>>>> Note a subtle difference: what matters is that guest, not host is x86. > >>>>> Matters for emulators which might reorder memory accesses. > >>>>> I guess this enforcement belongs in QEMU then? > >>>> Right, I mean to get started, the initial guest driver support and the > >>>> corresponding QEMU support for transitional vdpa backend can be limited > >>>> to x86 guest/host only. Since the config space is emulated in QEMU, I > >>>> suppose it's not hard to enforce in QEMU. > >>> It's more than just config space, most devices have headers before the buffer. > >> The ordering in datapath (data VQs) would have to rely on vendor's > >> support. Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w > >> vendor nowadays can/should well support the case when ORDER_PLATFORM is > >> not acked by the driver (actually this feature is filtered out by the > >> QEMU vhost-vdpa driver today), even with v1.0 spec conforming and modern > >> only vDPA device. > > That's a bug that needs to be fixed. > > > >> The control VQ is implemented in software in the > >> kernel, which can be easily accommodated/fixed when needed. > >> > >>>> QEMU can drive GET_LEGACY, > >>>> GET_ENDIAN et al ioctls in advance to get the capability from the > >>>> individual vendor driver. For that, we need another negotiation protocol > >>>> similar to vhost_user's protocol_features between the vdpa kernel and > >>>> QEMU, way before the guest driver is ever probed and its feature > >>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call > >>>> from the device, but we can assume weak ordering for legacy at this > >>>> point (x86 only)? > >>> I'm lost here, we have get_features() so: > >> I assume here you refer to get_device_features() that Eli just changed > >> the name. > >>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie > >>> 2) ORDER_PLATFORM means device requires platform ordering > >>> > >>> Any reason for having a new API for this? > >> Are you going to enforce all vDPA hardware vendors to support the > >> transitional model for legacy guest? meaning guest not acknowledging > >> VERSION_1 would use the legacy interfaces captured in the spec section > >> 7.4 (regarding ring layout, native endianness, message framing, vq > >> alignment of 4096, 32bit feature, no features_ok bit in status, IO port > >> interface i.e. all the things) instead? Noted we don't yet have a > >> set_device_features() that allows the vdpa device to tell whether it is > >> operating in transitional or modern-only mode. For software virtio, all > >> support for the legacy part in a transitional model has been built up > >> there already, however, it's not easy for vDPA vendors to implement all > >> the requirements for an all-or-nothing legacy guest support (big endian > >> guest for example). To these vendors, the legacy support within a > >> transitional model is more of feature to them and it's best to leave > >> some flexibility for them to implement partial support for legacy. That > >> in turn calls out the need for a vhost-user protocol feature like > >> negotiation API that can prohibit those unsupported guest setups to as > >> early as backend_init before launching the VM. > >> > >> > >>>>>> I > >>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level > >>>>>> 0.95 support, it seems all the ingredient had been there already dated back > >>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that > >>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for > >>>>>> a few DMA setup ops, which is virtio 1.0 only. > >>>>>> > >>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we > >>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term? > >>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out > >>>>>>>> of vdpa_get_config() to vdpa_set_config(). > >>>>>>>> > >>>>>>>> /* > >>>>>>>> * Config accesses aren't supposed to trigger before features are > >>>>>>>> set. > >>>>>>>> * If it does happen we assume a legacy guest. > >>>>>>>> */ > >>>>>>>> if (!vdev->features_valid) > >>>>>>>> vdpa_set_features(vdev, 0); > >>>>>>>> ops->get_config(vdev, offset, buf, len); > >>>>>>>> > >>>>>>>> I can post a patch to fix 2) if there's consensus already reached. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> -Siwei > >>>>>>> I'm not sure how important it is to change that. > >>>>>>> In any case it only affects transitional devices, right? > >>>>>>> Legacy only should not care ... > >>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the > >>>>>> modern one in a transitional device model rather than being legacy only. > >>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of > >>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy > >>>>>> only at the time of vdpa creation would simplify the implementation a lot? > >>>>>> > >>>>>> Thanks, > >>>>>> -Siwei > >>>>> I don't know for sure. Take a look at the work Halil was doing > >>>>> to try and support transitional devices with BE guests. > >>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU > >>>> implementation can be started to support x86 guest/host with little > >>>> endian and weak memory ordering first. The real trick is to detect > >>>> legacy guest - I am not sure if it's feasible to shift all the legacy > >>>> detection work to QEMU, or the kernel has to be part of the detection > >>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking > >>>> effort in QEMU) as well. Let me take a further look and get back. > >>> Michael may think differently but I think doing this in Qemu is much easier. > >> I think the key is whether we position emulating legacy interfaces in > >> QEMU doing translation on top of a v1.0 modern-only device in the > >> kernel, or we allow vdpa core (or you can say vhost-vdpa) and vendor > >> driver to support a transitional model in the kernel that is able to > >> work for both v0.95 and v1.0 drivers, with some slight aid from QEMU for > >> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for > >> the former we still rely on vendor for a performant data vqs > >> implementation, leaving the question to what it may end up eventually in > >> the kernel is effectively the latter). > > I think we can do the legacy interface emulation on top of the shadow > > VQ. And we know it works for sure. But I agree, it would be much > > easier if we depend on the vendor to implement a transitional device. > First I am not sure if there's a convincing case for users to deploy > vDPA with shadow (data) VQ against the pure software based backend. > Please enlighten me if there is. The problem is shadow VQ is the only solution that can works for all the cases. > > For us, the point to deploy vDPA for legacy guest is the acceleration > (what "A" stands for in "vDPA") part of it so that we can leverage the > hardware potential if at all possible. Not sure how the shadow VQ > implementation can easily deal with datapath acceleration without losing > too much performance? It's not easy, shadow VQ will lose performance for sure. > > > So assuming we depend on the vendor, I don't see anything that is > > strictly needed in the kernel, the kick or config access before > > DRIVER_OK can all be handled easily in Qemu unless I miss something. > Right, that's what I think too it's not quite a lot of work in the > kernel if vendor device offers the aid/support for transitional. The > kernel only provides the abstraction of device model (transitional or > modern-only), while vendor driver may implement early platform feature > discovery and apply legacy specific quirks (unsupported endianness, > mismatched page size, unsupported host memory ordering model) that the > device can't adapt to. I don't say we have to depend on the vendor, but > the point is that we must assume fully spec compliant transitional > support (the datapath in particular) from the vendor to get started, as > I guess it's probably the main motivation for users to deploy it - > acceleration of legacy guest workload without exhausting host computing > resource. Even if we get started with shadow VQ to mediate and translate > the datapath, eventually it may evolve towards leveraging datapath > offload to hardware if acceleration is the only convincing use case for > legacy support. Yes, so as discussed, I don't object the idea, kernel patches are more than welcomed. Thanks > > Thanks, > -Siwei > > The only value to do that in the kernel is that it can work for > > virtio-vdpa, but modern only virito-vpda is sufficient; we don't need > > any legacy stuff for that. > > > > Thanks > > > >> Thanks, > >> -Siwei > >> > >>> Thanks > >>> > >>> > >>> > >>>> Meanwhile, I'll check internally to see if a legacy only model would > >>>> work. Thanks. > >>>> > >>>> Thanks, > >>>> -Siwei > >>>> > >>>> > >>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote: > >>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote: > >>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote: > >>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote: > >>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote: > >>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify > >>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :( > >>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I > >>>>>>>>>>>>> know what the use > >>>>>>>>>>>>> case there will be for kernel to leverage such info > >>>>>>>>>>>>> directly? Is there a > >>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed > >>>>>>>>>>>>> differentiation (legacy v.s. modern) needed? > >>>>>>>>>>>> BTW a good API could be > >>>>>>>>>>>> > >>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int) > >>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int) > >>>>>>>>>>>> > >>>>>>>>>>>> we did it per vring but maybe that was a mistake ... > >>>>>>>>>>> Actually, I wonder whether it's good time to just not support > >>>>>>>>>>> legacy driver > >>>>>>>>>>> for vDPA. Consider: > >>>>>>>>>>> > >>>>>>>>>>> 1) It's definition is no-normative > >>>>>>>>>>> 2) A lot of budren of codes > >>>>>>>>>>> > >>>>>>>>>>> So qemu can still present the legacy device since the config > >>>>>>>>>>> space or other > >>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be > >>>>>>>>>>> accessed by > >>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary > >>>>>>>>>>> in this > >>>>>>>>>>> case? > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking > >>>>>>>>>> working userspace, qemu releases with vdpa support are out there and > >>>>>>>>>> seem to work for people. Any changes need to take that into account > >>>>>>>>>> and document compatibility concerns. > >>>>>>>>> Agree, let me check. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I note that any hardware > >>>>>>>>>> implementation is already broken for legacy except on platforms with > >>>>>>>>>> strong ordering which might be helpful in reducing the scope. > >>>>>>>>> Yes. > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>> > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization