Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 03, 2019 at 04:39:38PM -0400, Haotian Wang wrote:
> Hi Michael,
> 
> Thank you for your feedback!
> 
> On Tue, Sep 3, 2019 at 2:25 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote:
> > > This endpoint function enables the PCI endpoint to establish a virtual
> > > ethernet link with the PCI host. The main features are:
> > > 
> > > - Zero modification of PCI host kernel. The only requirement for the
> > >   PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and
> > >   virito_net.
> > 
> > Do we need to support legacy? Why not just the modern interface?
> > Even if yes, limiting device
> > to only legacy support is not a good idea.
> 
> I absolutely agree with you on modern interfaces being better. The issue
> here is that I did not support legacy because of compatibility reasons
> but because I was forced to choose legacy.
> 
> In the summer, I asked the hardware team whether I had read-write access
> to the capabilities registers from the endpoint but did not receive a
> response back then.
> 
> Now I can write the code using modern virtio but I cannot easily verify
> the epf will actually function on the hardware.
> 
> Reading and writing of capabilities list registers requires patches to
> the pci endpoint framework and the designware endpoint controller as
> well. I will probably work on that after I resolve these other issues.
> 
> > > +	if (!atomic_xchg(pending, 0))
> > > +		usleep_range(check_queues_usec_min,
> > > +			     check_queues_usec_max);
> > 
> > What's the usleep hackery doing? Set it too low and you
> > waste cycles. Set it too high and your latency suffers.
> > It would be nicer to just use a completion or something like this.
> 
> If the pending bit is set, the kthread will go directly into another
> round. The usleep is here because, in case the pending bit is not set,
> the kthread waits a certain while and then checks for buffers anyway as
> a sort of "fallback" check.
> 
> Problem with completion is that there is no condition to complete on. I
> can change the usleep_range() to schedule() if that is a more sensible
> thing to do.
> 
> If you mean wait until the pending bit is set, I can do that. Back when
> I wrote this module, the reason for not doing that was the endpoint
> might fail to catch notification from the host.
> 
> If you are interested, here is a more detailed expanation.


So the below basically means the communication is racy.
Yes using timers help recover from that, but in
a way that is very expensive, and will also lead
to latency spikes.

> > > +static int pci_epf_virtio_catch_notif(void *data)
> > > +{
> > > +	u16 changed;
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +	void __iomem *avail_idx;
> > > +	u16 event;
> > > +#endif
> > > +
> > > +	register const __virtio16 default_notify = epf_cpu_to_virtio16(2);
> > > +
> > > +	struct pci_epf_virtio *const epf_virtio = data;
> > > +	atomic_t *const pending = epf_virtio->pending;
> > > +
> > > +	while (!kthread_should_stop()) {
> > > +		changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify);
> > > +		if (changed != 2) {
> > > +			epf_virtio->legacy_cfg->q_notify = default_notify;
> > > +			/* The pci host has made changes to virtqueues */
> > > +			if (changed)
> > > +				atomic_cmpxchg(pending, 0, 1);
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +			avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed],
> > > +						  struct vring_avail,
> > > +						  idx);
> > > +			event = epf_ioread16(avail_idx) + event_suppression;
> > > +			write_avail_event(epf_virtio->used[changed], event);
> > > +#endif
> > > +		}
> > > +		usleep_range(notif_poll_usec_min,
> > > +			     notif_poll_usec_max);
> > > +	}
> > > +	return 0;
> > > +}
> 
> The pending bit is set if the notification polling thread sees a value
> in legacy_cfg->q_notify that is not 2, because the PCI host virtio_pci
> will write either 0 when its rx queue consumes something or 1 if its tx
> queue has offered a new buffer. My endpoing function will then set that
> value back to 2. In this process there are numerous things that can go
> wrong.
> 
> The host may write multiple 0 or 1's and the endpoint can only
> detect one of them in an notif_poll usleep interval.

Right. Notifications weren't designed to be implemented on top of RW
memory like this: the assumption was all notifications are buffered.
So if you implement modern instead, different queues can use
different addresses.

> 
> The host may write
> some non-2 value as the endpoint code just finishes detecting the last
> non-2 value and reverting that value back to 2, effectively nullifying
> the new non-2 value.
> 
> The host may decide to write a non-2 value
> immediately after the endpoint revert that value back to 2 but before
> the endpoint code finishes the current loop of execution, effectively
> making the value not reverted back to 2.
> 
> All these and other problems are made worse by the fact that the PCI
> host Linux usually runs on much faster cores than the one on PCI
> endpoint. This is why relying completely on pending bits is not always
> safe. Hence the "fallback" check using usleep hackery exists.
> Nevertheless I welcome any suggestion, because I do not like this
> treatment myself either.

As long as you have a small number of queues, you can poll both
of them. And to resolve racing with host, re-check
rings after you write 2 into the selector
(btw you also need a bunch of memory barriers, atomics don't
imply them automatically).


> > > +	net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1);
> > 
> > You don't need this without VIRTIO_NET_F_MQ.
> 
> Noted.
> 
> Best,
> Haotian



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux