On 12/15/2021 6:53 PM, Jason Wang wrote:
On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
On 12/14/2021 6:06 PM, Jason Wang wrote:
On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
Sorry for reviving this ancient thread. I was kinda lost for the conclusion
it ended up with. I have the following questions,
1. legacy guest support: from the past conversations it doesn't seem the
support will be completely dropped from the table, is my understanding
correct? Actually we're interested in supporting virtio v0.95 guest for x86,
which is backed by the spec at
https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
if there's request/need to support wilder legacy virtio versions earlier
beyond.
I personally feel it's less work to add in kernel than try to
work around it in userspace. Jason feels differently.
Maybe post the patches and this will prove to Jason it's not
too terrible?
I suppose if the vdpa vendor does support 0.95 in the datapath and ring
layout level and is limited to x86 only, there should be easy way out.
Note a subtle difference: what matters is that guest, not host is x86.
Matters for emulators which might reorder memory accesses.
I guess this enforcement belongs in QEMU then?
Right, I mean to get started, the initial guest driver support and the
corresponding QEMU support for transitional vdpa backend can be limited
to x86 guest/host only. Since the config space is emulated in QEMU, I
suppose it's not hard to enforce in QEMU.
It's more than just config space, most devices have headers before the buffer.
The ordering in datapath (data VQs) would have to rely on vendor's support.
Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
can/should well support the case when ORDER_PLATFORM is not acked by the
driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
today), even with v1.0 spec conforming and modern only vDPA device. The
control VQ is implemented in software in the kernel, which can be easily
accommodated/fixed when needed.
QEMU can drive GET_LEGACY,
GET_ENDIAN et al ioctls in advance to get the capability from the
individual vendor driver. For that, we need another negotiation protocol
similar to vhost_user's protocol_features between the vdpa kernel and
QEMU, way before the guest driver is ever probed and its feature
negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
from the device, but we can assume weak ordering for legacy at this
point (x86 only)?
I'm lost here, we have get_features() so:
I assume here you refer to get_device_features() that Eli just changed the
name.
1) VERSION_1 means the device uses LE if provided, otherwise natvie
2) ORDER_PLATFORM means device requires platform ordering
Any reason for having a new API for this?
Are you going to enforce all vDPA hardware vendors to support the
transitional model for legacy guest?
Do we really have other choices?
I suspect the legacy device is never implemented by any vendor:
1) no virtio way to detect host endian
This is even true for transitional device that is conforming to the
spec, right? The transport specific way to detect host endian is still
being discussed and the spec revision is not finalized yet so far as I
see. Why this suddenly becomes a requirement/blocker for h/w vendors to
implement the transitional model? Even if the spec is out, this is
pretty new and I suspect not all vendor would follow right away. I hope
the software framework can be tolerant with h/w vendors not supporting
host endianess (BE specifically) or not detecting it if they would like
to support a transitional device for legacy.
2) bypass IOMMU with translated requests
3) PIO port
Yes we have enp_vdpa, but it's more like a "transitional device" for
legacy only guests.
meaning guest not acknowledging
VERSION_1 would use the legacy interfaces captured in the spec section 7.4
(regarding ring layout, native endianness, message framing, vq alignment of
4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
all the things) instead?
Note that we only care about the datapath, control path is mediated anyhow.
So feature_ok and IO port isn't an issue. The rest looks like a must
for the hardware.
H/W vendors can opt out not implementing transitional interfaces at all
which limits itself a modern only device. Set endianess detection (via
transport specific means) aside, for vendors that wishes to support
transitional device with legacy interface, is it a hard stop to drop
supporting BE host if everything else is there? The spec today doesn't
define virtio specific means to detect host memory ordering or device
memory coherency, will it yet become a stopper another day for h/w
vendor to support more platforms?
Noted we don't yet have a set_device_features()
that allows the vdpa device to tell whether it is operating in transitional
or modern-only mode.
So the device feature should be provisioned via the netlink protocol.
Such netlink interface will only be used to limit feature exposure,
right? i.e. you can limit a transitional supporting vendor driver to
offering modern-only interface, but you never want to make a modern-only
vendor driver to support transitional (I'm not sure if it's a good idea
to support all the translation in software, esp. for datapath).
And what we want is not "set_device_feature()" but
"set_device_mandatory_feautre()", then the parent can choose to fail
the negotiation when VERSION_1 is not negotiated.
This assumes the transport specific detection of BE host is in place,
right? I am not clear who initiates the set_device_mandatory_feautre()
call, QEMU during guest feature negotiation, or admin user setting it
ahead via netlink?
Thanks,
-Siwei
Qemu then knows for
sure it talks to a transitional device or modern only device.
Thanks
For software virtio, all support for the legacy part in
a transitional model has been built up there already, however, it's not easy
for vDPA vendors to implement all the requirements for an all-or-nothing
legacy guest support (big endian guest for example). To these vendors, the
legacy support within a transitional model is more of feature to them and
it's best to leave some flexibility for them to implement partial support
for legacy. That in turn calls out the need for a vhost-user protocol
feature like negotiation API that can prohibit those unsupported guest
setups to as early as backend_init before launching the VM.
Right. Of note is the fact that it's a spec bug which I
hope yet to fix, though due to existing guest code the
fix won't be complete.
I thought at one point you pointed out to me that the spec does allow
config space read before claiming features_ok, and only config write
before features_ok is prohibited. I haven't read up the full thread of
Halil's VERSION_1 for transitional big endian device yet, but what is
the spec bug you hope to fix?
WRT ioctls, One thing we can do though is abuse set_features
where it's called by QEMU early on with just the VERSION_1
bit set, to distinguish between legacy and modern
interface. This before config space accesses and FEATURES_OK.
Halil has been working on this, pls take a look and maybe help him out.
Interesting thread, am reading now and see how I may leverage or help there.
I
checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
0.95 support, it seems all the ingredient had been there already dated back
to the DPDK days. The only major thing limiting is in the vDPA software that
the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
a few DMA setup ops, which is virtio 1.0 only.
2. suppose some form of legacy guest support needs to be there, how do we
deal with the bogus assumption below in vdpa_get_config() in the short term?
It looks one of the intuitive fix is to move the vdpa_set_features call out
of vdpa_get_config() to vdpa_set_config().
/*
* Config accesses aren't supposed to trigger before features are
set.
* If it does happen we assume a legacy guest.
*/
if (!vdev->features_valid)
vdpa_set_features(vdev, 0);
ops->get_config(vdev, offset, buf, len);
I can post a patch to fix 2) if there's consensus already reached.
Thanks,
-Siwei
I'm not sure how important it is to change that.
In any case it only affects transitional devices, right?
Legacy only should not care ...
Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
modern one in a transitional device model rather than being legacy only.
That way a v0.95 and v1.0 supporting vdpa parent can support both types of
guests without having to reconfigure. Or are you suggesting limit to legacy
only at the time of vdpa creation would simplify the implementation a lot?
Thanks,
-Siwei
I don't know for sure. Take a look at the work Halil was doing
to try and support transitional devices with BE guests.
Hmmm, we can have those endianness ioctls defined but the initial QEMU
implementation can be started to support x86 guest/host with little
endian and weak memory ordering first. The real trick is to detect
legacy guest - I am not sure if it's feasible to shift all the legacy
detection work to QEMU, or the kernel has to be part of the detection
(e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
effort in QEMU) as well. Let me take a further look and get back.
Michael may think differently but I think doing this in Qemu is much easier.
I think the key is whether we position emulating legacy interfaces in QEMU
doing translation on top of a v1.0 modern-only device in the kernel, or we
allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
transitional model in the kernel that is able to work for both v0.95 and
v1.0 drivers, with some slight aid from QEMU for
detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
former we still rely on vendor for a performant data vqs implementation,
leaving the question to what it may end up eventually in the kernel is
effectively the latter).
Thanks,
-Siwei
My suggestion is post the kernel patches, and we can evaluate
how much work they are.
Thanks for the feedback. I will take some read then get back, probably
after the winter break. Stay tuned.
Thanks,
-Siwei
Thanks
Meanwhile, I'll check internally to see if a legacy only model would
work. Thanks.
Thanks,
-Siwei
On 3/2/2021 2:53 AM, Jason Wang wrote:
On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(
Well, although I think adding an ioctl is doable, may I
know what the use
case there will be for kernel to leverage such info
directly? Is there a
case QEMU can't do with dedicate ioctls later if there's indeed
differentiation (legacy v.s. modern) needed?
BTW a good API could be
#define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
#define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
we did it per vring but maybe that was a mistake ...
Actually, I wonder whether it's good time to just not support
legacy driver
for vDPA. Consider:
1) It's definition is no-normative
2) A lot of budren of codes
So qemu can still present the legacy device since the config
space or other
stuffs that is presented by vhost-vDPA is not expected to be
accessed by
guest directly. Qemu can do the endian conversion when necessary
in this
case?
Thanks
Overall I would be fine with this approach but we need to avoid breaking
working userspace, qemu releases with vdpa support are out there and
seem to work for people. Any changes need to take that into account
and document compatibility concerns.
Agree, let me check.
I note that any hardware
implementation is already broken for legacy except on platforms with
strong ordering which might be helpful in reducing the scope.
Yes.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization