On Fri, Apr 21, 2023 at 10:28 PM Maxime Coquelin <maxime.coquelin@xxxxxxxxxx> wrote: > > > > On 4/21/23 07:51, Jason Wang wrote: > > On Thu, Apr 20, 2023 at 10:16 PM Maxime Coquelin > > <maxime.coquelin@xxxxxxxxxx> wrote: > >> > >> > >> > >> On 4/20/23 06:34, Jason Wang wrote: > >>> On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin > >>> <maxime.coquelin@xxxxxxxxxx> wrote: > >>>> > >>>> This small series enables virtio-net device type in VDUSE. > >>>> With it, basic operation have been tested, both with > >>>> virtio-vdpa and vhost-vdpa using DPDK Vhost library series > >>>> adding VDUSE support [0] using split rings layout. > >>>> > >>>> Control queue support (and so multiqueue) has also been > >>>> tested, but require a Kernel series from Jason Wang > >>>> relaxing control queue polling [1] to function reliably. > >>>> > >>>> Other than that, we have identified a few gaps: > >>>> > >>>> 1. Reconnection: > >>>> a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail > >>>> index, even after the virtqueue has already been > >>>> processed. Is that expected? I have tried instead to > >>>> get the driver's avail index directly from the avail > >>>> ring, but it does not seem reliable as I sometimes get > >>>> "id %u is not a head!\n" warnings. Also such solution > >>>> would not be possible with packed ring, as we need to > >>>> know the wrap counters values. > >>> > >>> Looking at the codes, it only returns the value that is set via > >>> set_vq_state(). I think it is expected to be called before the > >>> datapath runs. > >>> > >>> So when bound to virtio-vdpa, it is expected to return 0. But we need > >>> to fix the packed virtqueue case, I wonder if we need to call > >>> set_vq_state() explicitly in virtio-vdpa before starting the device. > >>> > >>> When bound to vhost-vdpa, Qemu will call VHOST_SET_VRING_BASE which > >>> will end up a call to set_vq_state(). Unfortunately, it doesn't > >>> support packed ring which needs some extension. > >>> > >>>> > >>>> b. Missing IOCTLs: it would be handy to have new IOCTLs to > >>>> query Virtio device status, > >>> > >>> What's the use case of this ioctl? It looks to me userspace is > >>> notified on each status change now: > >>> > >>> static int vduse_dev_set_status(struct vduse_dev *dev, u8 status) > >>> { > >>> struct vduse_dev_msg msg = { 0 }; > >>> > >>> msg.req.type = VDUSE_SET_STATUS; > >>> msg.req.s.status = status; > >>> > >>> return vduse_dev_msg_sync(dev, &msg); > >>> } > >> > >> The idea was to be able to query the status at reconnect time, and > >> neither having to assume its value nor having to store its value in a > >> file (the status could change while the VDUSE application is stopped, > >> but maybe it would receive the notification at reconnect). > > > > I see. > > > >> > >> I will prototype using a tmpfs file to save needed information, and see > >> if it works. > > > > It might work but then the API is not self contained. Maybe it's > > better to have a dedicated ioctl. > > > >> > >>>> and retrieve the config > >>>> space set at VDUSE_CREATE_DEV time. > >>> > >>> In order to be safe, VDUSE avoids writable config space. Otherwise > >>> drivers could block on config writing forever. That's why we don't do > >>> it now. > >> > >> The idea was not to make the config space writable, but just to be able > >> to fetch what was filled at VDUSE_CREATE_DEV time. > >> > >> With the tmpfs file, we can avoid doing that and just save the config > >> space there. > > > > Same as the case for status. > > I have cooked a DPDK patch to support reconnect with a tmpfs file as > suggested by Yongji: > > https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/53913f2b1155b02c44d5d3d298aafd357e7a8c48 This seems tricky, for example for status: dev->log->status = dev->status; What if we crash here? > > That's still rough around the edges, but it seems to work reliably > for the testing I have done so far. We'll certainly want to use the > tmpfs memory to directly store available indexes and wrap counters to > avoid introducing overhead in the datapath. That's fine, we probably need a chapter in the kernel doc to describe the reliable reconnection but it is not limited to tmpfs. > The tricky part will be to > manage NUMA affinity. This part is not clear to me, what affinity should we care about? There's a sysfs that was invented by YongJi for virtqueue affinity management recently. Thanks > > Regards, > Maxime > > > > > Thanks > > > >> > >>> We need to harden the config write before we can proceed to this I think. > >>> > >>>> > >>>> 2. VDUSE application as non-root: > >>>> We need to run the VDUSE application as non-root. There > >>>> is some race between the time the UDEV rule is applied > >>>> and the time the device starts being used. Discussing > >>>> with Jason, he suggested we may have a VDUSE daemon run > >>>> as root that would create the VDUSE device, manages its > >>>> rights and then pass its file descriptor to the VDUSE > >>>> app. However, with current IOCTLs, it means the VDUSE > >>>> daemon would need to know several information that > >>>> belongs to the VDUSE app implementing the device such > >>>> as supported Virtio features, config space, etc... > >>>> If we go that route, maybe we should have a control > >>>> IOCTL to create the device which would just pass the > >>>> device type. Then another device IOCTL to perform the > >>>> initialization. Would that make sense? > >>> > >>> I think so. We can hear from others. > >>> > >>>> > >>>> 3. Coredump: > >>>> In order to be able to perform post-mortem analysis, DPDK > >>>> Vhost library marks pages used for vrings and descriptors > >>>> buffers as MADV_DODUMP using madvise(). However with > >>>> VDUSE it fails with -EINVAL. My understanding is that we > >>>> set VM_DONTEXPAND flag to the VMAs and madvise's > >>>> MADV_DODUMP fails if it is present. I'm not sure to > >>>> understand why madvise would prevent MADV_DODUMP if > >>>> VM_DONTEXPAND is set. Any thoughts? > >>> > >>> Adding Peter who may know the answer. > >> > >> Thanks! > >> Maxime > >> > >>> Thanks > >>> > >>>> > >>>> [0]: https://patchwork.dpdk.org/project/dpdk/list/?series=27594&state=%2A&archive=both > >>>> [1]: https://lore.kernel.org/lkml/CACGkMEtgrxN3PPwsDo4oOsnsSLJfEmBEZ0WvjGRr3whU+QasUg@xxxxxxxxxxxxxx/T/ > >>>> > >>>> Maxime Coquelin (2): > >>>> vduse: validate block features only with block devices > >>>> vduse: enable Virtio-net device type > >>>> > >>>> drivers/vdpa/vdpa_user/vduse_dev.c | 11 +++++++---- > >>>> 1 file changed, 7 insertions(+), 4 deletions(-) > >>>> > >>>> -- > >>>> 2.39.2 > >>>> > >>> > >> > > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization