Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function

Haotian Wang <haotian.wang@xxxxxxxxxx> · Thu, 5 Sep 2019 12:15:16 -0400

On Thu, Sep 5, 2019 at 3:07 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > The host may write multiple 0 or 1's and the endpoint can only
> > detect one of them in an notif_poll usleep interval.
> 
> Right. Notifications weren't designed to be implemented on top of RW
> memory like this: the assumption was all notifications are buffered.

I can implement notification as a counter instead of a pending bit to
simulate a buffer. There will be many troublesome cases illustrated by
the following example.

The host sends a notification about available buffers 0-3. The endpoint
will probably consume buffers 0-5 as the notification is polled and
there is a delay. Then for some following notifications, the endpoint
may realize there are no corresponding available buffers to consume.
Those useless function calls waste cycles.

> So if you implement modern instead, different queues can use
> different addresses.

Will start working on this after switching the endpoint to using
vringh.c.

> > The host may write
> > some non-2 value as the endpoint code just finishes detecting the last
> > non-2 value and reverting that value back to 2, effectively nullifying
> > the new non-2 value.
> > 
> > The host may decide to write a non-2 value
> > immediately after the endpoint revert that value back to 2 but before
> > the endpoint code finishes the current loop of execution, effectively
> > making the value not reverted back to 2.
> > 
> > All these and other problems are made worse by the fact that the PCI
> > host Linux usually runs on much faster cores than the one on PCI
> > endpoint. This is why relying completely on pending bits is not always
> > safe. Hence the "fallback" check using usleep hackery exists.
> > Nevertheless I welcome any suggestion, because I do not like this
> > treatment myself either.
> 
> As long as you have a small number of queues, you can poll both
> of them. And to resolve racing with host, re-check
> rings after you write 2 into the selector

I assume your suggestion is based on modern virtio. vrings in legacy
virtio share a common notification read-write area.

> (btw you also need a bunch of memory barriers, atomics don't
> imply them automatically).

Thank you for the reminder. In this doc,
https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html, it says
"atomic_cmpxchg must provide explicit memory barriers around the operation,
although if the comparison fails then no memory ordering guarantees are
required". My understanding of this sentence is that the arch-specific
implementer of atomic_cmpxchg already surrounds the operation with
barriers in a more efficient way. The second part of the sentence
implies the doc's target audience is the implementer of atomic_cmpxchg.
Please correct me if I misunderstand this doc.

Thank you for your feedback.

Best,
Haotian