On Tue, Sep 03, 2019 at 04:39:38PM -0400, Haotian Wang wrote: > Hi Michael, > > Thank you for your feedback! > > On Tue, Sep 3, 2019 at 2:25 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote: > > > This endpoint function enables the PCI endpoint to establish a virtual > > > ethernet link with the PCI host. The main features are: > > > > > > - Zero modification of PCI host kernel. The only requirement for the > > > PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and > > > virito_net. > > > > Do we need to support legacy? Why not just the modern interface? > > Even if yes, limiting device > > to only legacy support is not a good idea. > > I absolutely agree with you on modern interfaces being better. The issue > here is that I did not support legacy because of compatibility reasons > but because I was forced to choose legacy. > > In the summer, I asked the hardware team whether I had read-write access > to the capabilities registers from the endpoint but did not receive a > response back then. > > Now I can write the code using modern virtio but I cannot easily verify > the epf will actually function on the hardware. > > Reading and writing of capabilities list registers requires patches to > the pci endpoint framework and the designware endpoint controller as > well. I will probably work on that after I resolve these other issues. > > > > + if (!atomic_xchg(pending, 0)) > > > + usleep_range(check_queues_usec_min, > > > + check_queues_usec_max); > > > > What's the usleep hackery doing? Set it too low and you > > waste cycles. Set it too high and your latency suffers. > > It would be nicer to just use a completion or something like this. > > If the pending bit is set, the kthread will go directly into another > round. The usleep is here because, in case the pending bit is not set, > the kthread waits a certain while and then checks for buffers anyway as > a sort of "fallback" check. > > Problem with completion is that there is no condition to complete on. I > can change the usleep_range() to schedule() if that is a more sensible > thing to do. > > If you mean wait until the pending bit is set, I can do that. Back when > I wrote this module, the reason for not doing that was the endpoint > might fail to catch notification from the host. > > If you are interested, here is a more detailed expanation. So the below basically means the communication is racy. Yes using timers help recover from that, but in a way that is very expensive, and will also lead to latency spikes. > > > +static int pci_epf_virtio_catch_notif(void *data) > > > +{ > > > + u16 changed; > > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION > > > + void __iomem *avail_idx; > > > + u16 event; > > > +#endif > > > + > > > + register const __virtio16 default_notify = epf_cpu_to_virtio16(2); > > > + > > > + struct pci_epf_virtio *const epf_virtio = data; > > > + atomic_t *const pending = epf_virtio->pending; > > > + > > > + while (!kthread_should_stop()) { > > > + changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify); > > > + if (changed != 2) { > > > + epf_virtio->legacy_cfg->q_notify = default_notify; > > > + /* The pci host has made changes to virtqueues */ > > > + if (changed) > > > + atomic_cmpxchg(pending, 0, 1); > > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION > > > + avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed], > > > + struct vring_avail, > > > + idx); > > > + event = epf_ioread16(avail_idx) + event_suppression; > > > + write_avail_event(epf_virtio->used[changed], event); > > > +#endif > > > + } > > > + usleep_range(notif_poll_usec_min, > > > + notif_poll_usec_max); > > > + } > > > + return 0; > > > +} > > The pending bit is set if the notification polling thread sees a value > in legacy_cfg->q_notify that is not 2, because the PCI host virtio_pci > will write either 0 when its rx queue consumes something or 1 if its tx > queue has offered a new buffer. My endpoing function will then set that > value back to 2. In this process there are numerous things that can go > wrong. > > The host may write multiple 0 or 1's and the endpoint can only > detect one of them in an notif_poll usleep interval. Right. Notifications weren't designed to be implemented on top of RW memory like this: the assumption was all notifications are buffered. So if you implement modern instead, different queues can use different addresses. > > The host may write > some non-2 value as the endpoint code just finishes detecting the last > non-2 value and reverting that value back to 2, effectively nullifying > the new non-2 value. > > The host may decide to write a non-2 value > immediately after the endpoint revert that value back to 2 but before > the endpoint code finishes the current loop of execution, effectively > making the value not reverted back to 2. > > All these and other problems are made worse by the fact that the PCI > host Linux usually runs on much faster cores than the one on PCI > endpoint. This is why relying completely on pending bits is not always > safe. Hence the "fallback" check using usleep hackery exists. > Nevertheless I welcome any suggestion, because I do not like this > treatment myself either. As long as you have a small number of queues, you can poll both of them. And to resolve racing with host, re-check rings after you write 2 into the selector (btw you also need a bunch of memory barriers, atomics don't imply them automatically). > > > + net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1); > > > > You don't need this without VIRTIO_NET_F_MQ. > > Noted. > > Best, > Haotian