Hi Michael, Thank you for your feedback! On Tue, Sep 3, 2019 at 2:25 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote: > > This endpoint function enables the PCI endpoint to establish a virtual > > ethernet link with the PCI host. The main features are: > > > > - Zero modification of PCI host kernel. The only requirement for the > > PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and > > virito_net. > > Do we need to support legacy? Why not just the modern interface? > Even if yes, limiting device > to only legacy support is not a good idea. I absolutely agree with you on modern interfaces being better. The issue here is that I did not support legacy because of compatibility reasons but because I was forced to choose legacy. In the summer, I asked the hardware team whether I had read-write access to the capabilities registers from the endpoint but did not receive a response back then. Now I can write the code using modern virtio but I cannot easily verify the epf will actually function on the hardware. Reading and writing of capabilities list registers requires patches to the pci endpoint framework and the designware endpoint controller as well. I will probably work on that after I resolve these other issues. > > + if (!atomic_xchg(pending, 0)) > > + usleep_range(check_queues_usec_min, > > + check_queues_usec_max); > > What's the usleep hackery doing? Set it too low and you > waste cycles. Set it too high and your latency suffers. > It would be nicer to just use a completion or something like this. If the pending bit is set, the kthread will go directly into another round. The usleep is here because, in case the pending bit is not set, the kthread waits a certain while and then checks for buffers anyway as a sort of "fallback" check. Problem with completion is that there is no condition to complete on. I can change the usleep_range() to schedule() if that is a more sensible thing to do. If you mean wait until the pending bit is set, I can do that. Back when I wrote this module, the reason for not doing that was the endpoint might fail to catch notification from the host. If you are interested, here is a more detailed expanation. > > +static int pci_epf_virtio_catch_notif(void *data) > > +{ > > + u16 changed; > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION > > + void __iomem *avail_idx; > > + u16 event; > > +#endif > > + > > + register const __virtio16 default_notify = epf_cpu_to_virtio16(2); > > + > > + struct pci_epf_virtio *const epf_virtio = data; > > + atomic_t *const pending = epf_virtio->pending; > > + > > + while (!kthread_should_stop()) { > > + changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify); > > + if (changed != 2) { > > + epf_virtio->legacy_cfg->q_notify = default_notify; > > + /* The pci host has made changes to virtqueues */ > > + if (changed) > > + atomic_cmpxchg(pending, 0, 1); > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION > > + avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed], > > + struct vring_avail, > > + idx); > > + event = epf_ioread16(avail_idx) + event_suppression; > > + write_avail_event(epf_virtio->used[changed], event); > > +#endif > > + } > > + usleep_range(notif_poll_usec_min, > > + notif_poll_usec_max); > > + } > > + return 0; > > +} The pending bit is set if the notification polling thread sees a value in legacy_cfg->q_notify that is not 2, because the PCI host virtio_pci will write either 0 when its rx queue consumes something or 1 if its tx queue has offered a new buffer. My endpoing function will then set that value back to 2. In this process there are numerous things that can go wrong. The host may write multiple 0 or 1's and the endpoint can only detect one of them in an notif_poll usleep interval. The host may write some non-2 value as the endpoint code just finishes detecting the last non-2 value and reverting that value back to 2, effectively nullifying the new non-2 value. The host may decide to write a non-2 value immediately after the endpoint revert that value back to 2 but before the endpoint code finishes the current loop of execution, effectively making the value not reverted back to 2. All these and other problems are made worse by the fact that the PCI host Linux usually runs on much faster cores than the one on PCI endpoint. This is why relying completely on pending bits is not always safe. Hence the "fallback" check using usleep hackery exists. Nevertheless I welcome any suggestion, because I do not like this treatment myself either. > > + net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1); > > You don't need this without VIRTIO_NET_F_MQ. Noted. Best, Haotian