Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/11/27 上午6:01, Alan Mikhak wrote:
On Tue, Nov 26, 2019 at 1:55 PM Alan Mikhak <alan.mikhak@xxxxxxxxxx> wrote:
On Tue, Nov 26, 2019 at 4:36 AM Kishon Vijay Abraham I <kishon@xxxxxx> wrote:
Hi Jason,

On 26/11/19 3:28 PM, Jason Wang wrote:
On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote:
+Alan, Jon

Hi Jason, Haotian, Alan,

On 05/09/19 8:26 AM, Jason Wang wrote:
On 2019/9/5 上午5:58, Haotian Wang wrote:
Hi Jason,

I have an additional comment regarding using vring.

On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
Kind of, in order to address the above limitation, you probably want to
implement a vringh based netdevice and driver. It will work like,
instead of trying to represent a virtio-net device to endpoint,
represent a new type of network device, it uses two vringh ring instead
virtio ring. The vringh ring is usually used to implement the
counterpart of virtio driver. The advantages are obvious:

- no need to deal with two sets of features, config space etc.
- network specific, from the point of endpoint linux, it's not a virtio
device, no need to care about transport stuffs or embedding internal
virtio-net specific data structures
- reuse the exist codes (vringh) to avoid duplicated bugs, implementing
a virtqueue is kind of challenge
With vringh.c, there is no easy way to interface with virtio_net.c.

vringh.c is linked with vhost/net.c nicely
Let me clarify, vhost_net doesn't use vringh at all (though there's a
plan to switch to use vringh).


but again it's not easy to
interface vhost/net.c with the network stack of endpoint kernel. The
vhost drivers are not designed with the purpose of creating another
suite of virtual devices in the host kernel in the first place. If I try
to manually write code for this interfacing, it seems that I will do
duplicate work that virtio_net.c does.
Let me explain:

- I'm not suggesting to use vhost_net since it can only deal with
userspace virtio rings.
- I suggest to introduce netdev that has vringh vring assoticated.
Vringh was designed to deal with virtio ring located at different types
of memory. It supports userspace vring and kernel vring currently, but
it should not be too hard to add support for e.g endpoint device that
requires DMA or whatever other method to access the vring. So it was by
design to talk directly with e.g kernel virtio device.
- In your case, you can read vring address from virtio config space
through endpoint framework and then create vringh. It's as simple as:
creating a netdev, read vring address, and initialize vringh. Then you
can use vringh helper to get iov and build skb etc (similar to caif_virtio).
  From the discussions above and from looking at Jason's mdev patches [1], I've
created the block diagram below.

While this patch (from Haotian) deals with RC<->EP connection, I'd also like
this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC)
[2][3].

+-----------------------------------+   +-------------------------------------+
|                                   |   |                                     |
|  +------------+  +--------------+ |   | +------------+  +--------------+    |
|  | vringh_net |  | vringh_rpmsg | |   | | virtio_net |  | virtio_rpmsg |    |
|  +------------+  +--------------+ |   | +------------+  +--------------+    |
|                                   |   |                                     |
|          +---------------+        |   |          +---------------+          |
|          |  vringh_mdev  |        |   |          |  virtio_mdev  |          |
|          +---------------+        |   |          +---------------+          |
|                                   |   |                                     |
|  +------------+   +------------+  |   | +-------------------+ +------------+|
|  | vringh_epf |   | vringh_ntb |  |   | | virtio_pci_common | | virtio_ntb ||
|  +------------+   +------------+  |   | +-------------------+ +------------+|
| (PCI EP Device)   (NTB Secondary  |   |        (PCI RC)       (NTB Primary  |
|                       Device)     |   |                          Device)    |
|                                   |   |                                     |
|                                   |   |                                     |
|             (A)                   |   |              (B)                    |
+-----------------------------------+   +-------------------------------------+

GUEST SIDE (B):
===============
In the virtualization terminology, the side labeled (B) will be the guest side.
Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB
for this discussion since PCIe host side SW will execute on both ends of the
link in the case of NTB. However I've included in the block diagram since the
design we adopt should be able to be extended for NTB as well).

Most of the pieces in (B) already exists.
1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it
     is.
2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP
     usecases as well. Jason has created mvnet based on virtio_mdev, but for EP
     usecases virtio_pci_common and virtio_ntb should use it.

Can we implement NTB as a transport for virtio, then there's no need for
virtio_mdev?
Yes, we could have NTB specific virtio_config_ops. Where exactly should
virtio_mdev be used?


It would be used when you want to consider a userspace driver which is not support by current virito_config_ops.



3) virtio_pci_common: This should be used when a PCIe EPF is connected. This
     should be modified to create virtio_mdev instead of directly creating virtio
     device.
4) virtio_ntb: This is used for NTB where one end of the link should use
     virtio_ntb. This should create virtio_mdev.

With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally
any virtio drivers can be used for EP or NTB (In the block diagram above
virtio_net and virtio_rpmsg can be used).

HOST SIDE (A):
===============
In the virtualization terminology, the side labeled (A) will be the host side.
Here it will be the place where PCIe device (Endpoint) side SW will execute.

Bits and pieces of (A) should exist but there should be considerable work in
this.
1) vringh_net: There should be vringh drivers corresponding to
     the virtio drivers on the guest side (B). vringh_net should register with
     the net core. The vringh_net device should be created by vringh_mdev. This
     should be new development.
2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The
     vringh_rpmsg device should be created by vringh_mdev.
3) vringh_mdev: This layer should define ops specific to vringh (e.g
     get_desc_addr() should give vring descriptor address and will depend on
     either EP device or NTB device). I haven't looked further on what other ops
     will be needed. IMO this layer should also decide whether _kern() or _user()
     vringh helpers should be invoked.

Right, but probably not necessary called "mdev", it could just some abstraction
as a set of callbacks.
Yeah, we could have something like vringh_config_ops. Once we start to
implement, this might get more clear.


Yes, it is.



4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to
     get the vring descriptor address.
5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead
     of EPF device.

Jason,

Can you give your comments on the above design? Do you see any flaws/issues
with the above approach?

Looks good overall, see questions above.
Thanks for your comments Jason.

Haotian, Alan, Me or whoever gets to implement this first, should try to follow
the above discussed approach.
Kishon,

Thank you, and Jason Wang, for comments and suggestions re: NTB.

My preference is to see Haotian continue his work on this
patch, if and when possible. As for expanding the scope to
support NTB, I personally find it very interesting. I will
keep an eye open for a suitable hardware platform in house
before figuring out if and when it would be possible to do such
work. From your slides, you may get there first since you
seem to have a suitable hardware platform already.
- haotian.wang@xxxxxxxxxx

other: haotian.wang@xxxxxxxx

Regards,
Alan

Thanks
Kishon

Thanks


Thanks
Kishon

[1] -> https://lkml.org/lkml/2019/11/18/261
[2] -> https://lkml.org/lkml/2019/9/26/291
[3] ->
https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf

There will be two more main disadvantages probably.

Firstly, there will be two layers of overheads. vhost/net.c uses
vringh.c to channel data buffers into some struct sockets. This is the
first layer of overhead. That the virtual network device will have to
use these sockets somehow adds another layer of overhead.
As I said, it doesn't work like vhost and no socket is needed at all.


Secondly, probing, intialization and de-initialization of the virtual
network_device are already non-trivial. I'll likely copy this part
almost verbatim from virtio_net.c in the end. So in the end, there will
be more duplicate code.
It will be a new type of network device instead of virtio, you don't
need to care any virtio stuffs but vringh in your codes. So it looks to
me it would be much simpler and compact.

But I'm not saying your method is no way to go, but you should deal with
lots of other issues like I've replied in the previous mail. What you
want to achieve is

1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio
ring <-> Endpoint (virtio with customized config_ops).

But I suggest is

2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh
vring (virtio ring in the Host) <-> network device

The differences is.
- Complexity: In your proposal, there will be two virtio devices and 4
virtqueues. It means you need to prepare two sets of features, config
ops etc. And dealing with inconsistent feature will be a pain. It may
work for simple case like a virtio-net device with only _F_MAC, but it
would be hard to be expanded. If we decide to go for vringh, there will
be a single virtio device and 2 virtqueues. In the endpoint part, it
will be 2 vringh vring (which is actually point the same virtqueue from
Host side) and a normal network device. There's no need for dealing with
inconsistency, since vringh basically sever as a a device
implementation, the feature negotiation is just between device (network
device with vringh) and driver (virtito-pci) from the view of Linux
running on the PCI Host.
- Maintainability: A third path for dealing virtio ring. We've already
had vhost and vringh, a third path will add a lot of overhead when
trying to maintaining them. My proposal will try to reuse vringh,
there's no need a new path.
- Layer violation: We want to hide the transport details from the device
and make virito-net device can be used without modification. But your
codes try to poke information like virtnet_info. My proposal is to just
have a new networking device that won't need to care virtio at all. It's
not that hard as you imagine to have a new type of netdev, I suggest to
take a look at how caif_virtio is done, it would be helpful.

If you still decide to go with two two virtio device model, you need
probably:
- Proving two sets of config and features, and deal with inconsistency
- Try to reuse the vringh codes
- Do not refer internal structures from virtio-net.c

But I recommend to take a step of trying vringh method which should be
much simpler.

Thanks


Thank you for your patience!

Best,
Haotian





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux