Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function

Alan Mikhak <alan.mikhak@xxxxxxxxxx> · Tue, 26 Nov 2019 14:01:59 -0800



On Tue, Nov 26, 2019 at 1:55 PM Alan Mikhak <alan.mikhak@xxxxxxxxxx> wrote:
>
> On Tue, Nov 26, 2019 at 4:36 AM Kishon Vijay Abraham I <kishon@xxxxxx> wrote:
> >
> > Hi Jason,
> >
> > On 26/11/19 3:28 PM, Jason Wang wrote:
> > >
> > > On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote:
> > >> +Alan, Jon
> > >>
> > >> Hi Jason, Haotian, Alan,
> > >>
> > >> On 05/09/19 8:26 AM, Jason Wang wrote:
> > >>> On 2019/9/5 上午5:58, Haotian Wang wrote:
> > >>>> Hi Jason,
> > >>>>
> > >>>> I have an additional comment regarding using vring.
> > >>>>
> > >>>> On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >>>>> Kind of, in order to address the above limitation, you probably want to
> > >>>>> implement a vringh based netdevice and driver. It will work like,
> > >>>>> instead of trying to represent a virtio-net device to endpoint,
> > >>>>> represent a new type of network device, it uses two vringh ring instead
> > >>>>> virtio ring. The vringh ring is usually used to implement the
> > >>>>> counterpart of virtio driver. The advantages are obvious:
> > >>>>>
> > >>>>> - no need to deal with two sets of features, config space etc.
> > >>>>> - network specific, from the point of endpoint linux, it's not a virtio
> > >>>>> device, no need to care about transport stuffs or embedding internal
> > >>>>> virtio-net specific data structures
> > >>>>> - reuse the exist codes (vringh) to avoid duplicated bugs, implementing
> > >>>>> a virtqueue is kind of challenge
> > >>>> With vringh.c, there is no easy way to interface with virtio_net.c.
> > >>>>
> > >>>> vringh.c is linked with vhost/net.c nicely
> > >>>
> > >>> Let me clarify, vhost_net doesn't use vringh at all (though there's a
> > >>> plan to switch to use vringh).
> > >>>
> > >>>
> > >>>> but again it's not easy to
> > >>>> interface vhost/net.c with the network stack of endpoint kernel. The
> > >>>> vhost drivers are not designed with the purpose of creating another
> > >>>> suite of virtual devices in the host kernel in the first place. If I try
> > >>>> to manually write code for this interfacing, it seems that I will do
> > >>>> duplicate work that virtio_net.c does.
> > >>>
> > >>> Let me explain:
> > >>>
> > >>> - I'm not suggesting to use vhost_net since it can only deal with
> > >>> userspace virtio rings.
> > >>> - I suggest to introduce netdev that has vringh vring assoticated.
> > >>> Vringh was designed to deal with virtio ring located at different types
> > >>> of memory. It supports userspace vring and kernel vring currently, but
> > >>> it should not be too hard to add support for e.g endpoint device that
> > >>> requires DMA or whatever other method to access the vring. So it was by
> > >>> design to talk directly with e.g kernel virtio device.
> > >>> - In your case, you can read vring address from virtio config space
> > >>> through endpoint framework and then create vringh. It's as simple as:
> > >>> creating a netdev, read vring address, and initialize vringh. Then you
> > >>> can use vringh helper to get iov and build skb etc (similar to caif_virtio).
> > >>  From the discussions above and from looking at Jason's mdev patches [1], I've
> > >> created the block diagram below.
> > >>
> > >> While this patch (from Haotian) deals with RC<->EP connection, I'd also like
> > >> this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC)
> > >> [2][3].
> > >>
> > >> +-----------------------------------+   +-------------------------------------+
> > >> |                                   |   |                                     |
> > >> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> > >> |  | vringh_net |  | vringh_rpmsg | |   | | virtio_net |  | virtio_rpmsg |    |
> > >> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> > >> |                                   |   |                                     |
> > >> |          +---------------+        |   |          +---------------+          |
> > >> |          |  vringh_mdev  |        |   |          |  virtio_mdev  |          |
> > >> |          +---------------+        |   |          +---------------+          |
> > >> |                                   |   |                                     |
> > >> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> > >> |  | vringh_epf |   | vringh_ntb |  |   | | virtio_pci_common | | virtio_ntb ||
> > >> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> > >> | (PCI EP Device)   (NTB Secondary  |   |        (PCI RC)       (NTB Primary  |
> > >> |                       Device)     |   |                          Device)    |
> > >> |                                   |   |                                     |
> > >> |                                   |   |                                     |
> > >> |             (A)                   |   |              (B)                    |
> > >> +-----------------------------------+   +-------------------------------------+
> > >>
> > >> GUEST SIDE (B):
> > >> ===============
> > >> In the virtualization terminology, the side labeled (B) will be the guest side.
> > >> Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB
> > >> for this discussion since PCIe host side SW will execute on both ends of the
> > >> link in the case of NTB. However I've included in the block diagram since the
> > >> design we adopt should be able to be extended for NTB as well).
> > >>
> > >> Most of the pieces in (B) already exists.
> > >> 1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it
> > >>     is.
> > >> 2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP
> > >>     usecases as well. Jason has created mvnet based on virtio_mdev, but for EP
> > >>     usecases virtio_pci_common and virtio_ntb should use it.
> > >
> > >
> > > Can we implement NTB as a transport for virtio, then there's no need for
> > > virtio_mdev?
> >
> > Yes, we could have NTB specific virtio_config_ops. Where exactly should
> > virtio_mdev be used?
> > >
> > >
> > >> 3) virtio_pci_common: This should be used when a PCIe EPF is connected. This
> > >>     should be modified to create virtio_mdev instead of directly creating virtio
> > >>     device.
> > >> 4) virtio_ntb: This is used for NTB where one end of the link should use
> > >>     virtio_ntb. This should create virtio_mdev.
> > >>
> > >> With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally
> > >> any virtio drivers can be used for EP or NTB (In the block diagram above
> > >> virtio_net and virtio_rpmsg can be used).
> > >>
> > >> HOST SIDE (A):
> > >> ===============
> > >> In the virtualization terminology, the side labeled (A) will be the host side.
> > >> Here it will be the place where PCIe device (Endpoint) side SW will execute.
> > >>
> > >> Bits and pieces of (A) should exist but there should be considerable work in
> > >> this.
> > >> 1) vringh_net: There should be vringh drivers corresponding to
> > >>     the virtio drivers on the guest side (B). vringh_net should register with
> > >>     the net core. The vringh_net device should be created by vringh_mdev. This
> > >>     should be new development.
> > >> 2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The
> > >>     vringh_rpmsg device should be created by vringh_mdev.
> > >> 3) vringh_mdev: This layer should define ops specific to vringh (e.g
> > >>     get_desc_addr() should give vring descriptor address and will depend on
> > >>     either EP device or NTB device). I haven't looked further on what other ops
> > >>     will be needed. IMO this layer should also decide whether _kern() or _user()
> > >>     vringh helpers should be invoked.
> > >
> > >
> > > Right, but probably not necessary called "mdev", it could just some abstraction
> > > as a set of callbacks.
> >
> > Yeah, we could have something like vringh_config_ops. Once we start to
> > implement, this might get more clear.
> > >
> > >
> > >> 4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to
> > >>     get the vring descriptor address.
> > >> 5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead
> > >>     of EPF device.
> > >>
> > >> Jason,
> > >>
> > >> Can you give your comments on the above design? Do you see any flaws/issues
> > >> with the above approach?
> > >
> > >
> > > Looks good overall, see questions above.
> >
> > Thanks for your comments Jason.
> >
> > Haotian, Alan, Me or whoever gets to implement this first, should try to follow
> > the above discussed approach.
>
> Kishon,
>
> Thank you, and Jason Wang, for comments and suggestions re: NTB.
>
> My preference is to see Haotian continue his work on this
> patch, if and when possible. As for expanding the scope to
> support NTB, I personally find it very interesting. I will
> keep an eye open for a suitable hardware platform in house
> before figuring out if and when it would be possible to do such
> work. From your slides, you may get there first since you
> seem to have a suitable hardware platform already.

- haotian.wang@xxxxxxxxxx

other: haotian.wang@xxxxxxxx

>
> Regards,
> Alan
>
> >
> > Thanks
> > Kishon
> >
> > >
> > > Thanks
> > >
> > >
> > >>
> > >> Thanks
> > >> Kishon
> > >>
> > >> [1] -> https://lkml.org/lkml/2019/11/18/261
> > >> [2] -> https://lkml.org/lkml/2019/9/26/291
> > >> [3] ->
> > >> https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf
> > >>
> > >>>
> > >>>> There will be two more main disadvantages probably.
> > >>>>
> > >>>> Firstly, there will be two layers of overheads. vhost/net.c uses
> > >>>> vringh.c to channel data buffers into some struct sockets. This is the
> > >>>> first layer of overhead. That the virtual network device will have to
> > >>>> use these sockets somehow adds another layer of overhead.
> > >>>
> > >>> As I said, it doesn't work like vhost and no socket is needed at all.
> > >>>
> > >>>
> > >>>> Secondly, probing, intialization and de-initialization of the virtual
> > >>>> network_device are already non-trivial. I'll likely copy this part
> > >>>> almost verbatim from virtio_net.c in the end. So in the end, there will
> > >>>> be more duplicate code.
> > >>>
> > >>> It will be a new type of network device instead of virtio, you don't
> > >>> need to care any virtio stuffs but vringh in your codes. So it looks to
> > >>> me it would be much simpler and compact.
> > >>>
> > >>> But I'm not saying your method is no way to go, but you should deal with
> > >>> lots of other issues like I've replied in the previous mail. What you
> > >>> want to achieve is
> > >>>
> > >>> 1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio
> > >>> ring <-> Endpoint (virtio with customized config_ops).
> > >>>
> > >>> But I suggest is
> > >>>
> > >>> 2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh
> > >>> vring (virtio ring in the Host) <-> network device
> > >>>
> > >>> The differences is.
> > >>> - Complexity: In your proposal, there will be two virtio devices and 4
> > >>> virtqueues. It means you need to prepare two sets of features, config
> > >>> ops etc. And dealing with inconsistent feature will be a pain. It may
> > >>> work for simple case like a virtio-net device with only _F_MAC, but it
> > >>> would be hard to be expanded. If we decide to go for vringh, there will
> > >>> be a single virtio device and 2 virtqueues. In the endpoint part, it
> > >>> will be 2 vringh vring (which is actually point the same virtqueue from
> > >>> Host side) and a normal network device. There's no need for dealing with
> > >>> inconsistency, since vringh basically sever as a a device
> > >>> implementation, the feature negotiation is just between device (network
> > >>> device with vringh) and driver (virtito-pci) from the view of Linux
> > >>> running on the PCI Host.
> > >>> - Maintainability: A third path for dealing virtio ring. We've already
> > >>> had vhost and vringh, a third path will add a lot of overhead when
> > >>> trying to maintaining them. My proposal will try to reuse vringh,
> > >>> there's no need a new path.
> > >>> - Layer violation: We want to hide the transport details from the device
> > >>> and make virito-net device can be used without modification. But your
> > >>> codes try to poke information like virtnet_info. My proposal is to just
> > >>> have a new networking device that won't need to care virtio at all. It's
> > >>> not that hard as you imagine to have a new type of netdev, I suggest to
> > >>> take a look at how caif_virtio is done, it would be helpful.
> > >>>
> > >>> If you still decide to go with two two virtio device model, you need
> > >>> probably:
> > >>> - Proving two sets of config and features, and deal with inconsistency
> > >>> - Try to reuse the vringh codes
> > >>> - Do not refer internal structures from virtio-net.c
> > >>>
> > >>> But I recommend to take a step of trying vringh method which should be
> > >>> much simpler.
> > >>>
> > >>> Thanks
> > >>>
> > >>>
> > >>>> Thank you for your patience!
> > >>>>
> > >>>> Best,
> > >>>> Haotian
> > >