Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 26, 2019 at 4:36 AM Kishon Vijay Abraham I <kishon@xxxxxx> wrote:
>
> Hi Jason,
>
> On 26/11/19 3:28 PM, Jason Wang wrote:
> >
> > On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote:
> >> +Alan, Jon
> >>
> >> Hi Jason, Haotian, Alan,
> >>
> >> On 05/09/19 8:26 AM, Jason Wang wrote:
> >>> On 2019/9/5 上午5:58, Haotian Wang wrote:
> >>>> Hi Jason,
> >>>>
> >>>> I have an additional comment regarding using vring.
> >>>>
> >>>> On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>> Kind of, in order to address the above limitation, you probably want to
> >>>>> implement a vringh based netdevice and driver. It will work like,
> >>>>> instead of trying to represent a virtio-net device to endpoint,
> >>>>> represent a new type of network device, it uses two vringh ring instead
> >>>>> virtio ring. The vringh ring is usually used to implement the
> >>>>> counterpart of virtio driver. The advantages are obvious:
> >>>>>
> >>>>> - no need to deal with two sets of features, config space etc.
> >>>>> - network specific, from the point of endpoint linux, it's not a virtio
> >>>>> device, no need to care about transport stuffs or embedding internal
> >>>>> virtio-net specific data structures
> >>>>> - reuse the exist codes (vringh) to avoid duplicated bugs, implementing
> >>>>> a virtqueue is kind of challenge
> >>>> With vringh.c, there is no easy way to interface with virtio_net.c.
> >>>>
> >>>> vringh.c is linked with vhost/net.c nicely
> >>>
> >>> Let me clarify, vhost_net doesn't use vringh at all (though there's a
> >>> plan to switch to use vringh).
> >>>
> >>>
> >>>> but again it's not easy to
> >>>> interface vhost/net.c with the network stack of endpoint kernel. The
> >>>> vhost drivers are not designed with the purpose of creating another
> >>>> suite of virtual devices in the host kernel in the first place. If I try
> >>>> to manually write code for this interfacing, it seems that I will do
> >>>> duplicate work that virtio_net.c does.
> >>>
> >>> Let me explain:
> >>>
> >>> - I'm not suggesting to use vhost_net since it can only deal with
> >>> userspace virtio rings.
> >>> - I suggest to introduce netdev that has vringh vring assoticated.
> >>> Vringh was designed to deal with virtio ring located at different types
> >>> of memory. It supports userspace vring and kernel vring currently, but
> >>> it should not be too hard to add support for e.g endpoint device that
> >>> requires DMA or whatever other method to access the vring. So it was by
> >>> design to talk directly with e.g kernel virtio device.
> >>> - In your case, you can read vring address from virtio config space
> >>> through endpoint framework and then create vringh. It's as simple as:
> >>> creating a netdev, read vring address, and initialize vringh. Then you
> >>> can use vringh helper to get iov and build skb etc (similar to caif_virtio).
> >>  From the discussions above and from looking at Jason's mdev patches [1], I've
> >> created the block diagram below.
> >>
> >> While this patch (from Haotian) deals with RC<->EP connection, I'd also like
> >> this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC)
> >> [2][3].
> >>
> >> +-----------------------------------+   +-------------------------------------+
> >> |                                   |   |                                     |
> >> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> >> |  | vringh_net |  | vringh_rpmsg | |   | | virtio_net |  | virtio_rpmsg |    |
> >> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> >> |                                   |   |                                     |
> >> |          +---------------+        |   |          +---------------+          |
> >> |          |  vringh_mdev  |        |   |          |  virtio_mdev  |          |
> >> |          +---------------+        |   |          +---------------+          |
> >> |                                   |   |                                     |
> >> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> >> |  | vringh_epf |   | vringh_ntb |  |   | | virtio_pci_common | | virtio_ntb ||
> >> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> >> | (PCI EP Device)   (NTB Secondary  |   |        (PCI RC)       (NTB Primary  |
> >> |                       Device)     |   |                          Device)    |
> >> |                                   |   |                                     |
> >> |                                   |   |                                     |
> >> |             (A)                   |   |              (B)                    |
> >> +-----------------------------------+   +-------------------------------------+
> >>
> >> GUEST SIDE (B):
> >> ===============
> >> In the virtualization terminology, the side labeled (B) will be the guest side.
> >> Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB
> >> for this discussion since PCIe host side SW will execute on both ends of the
> >> link in the case of NTB. However I've included in the block diagram since the
> >> design we adopt should be able to be extended for NTB as well).
> >>
> >> Most of the pieces in (B) already exists.
> >> 1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it
> >>     is.
> >> 2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP
> >>     usecases as well. Jason has created mvnet based on virtio_mdev, but for EP
> >>     usecases virtio_pci_common and virtio_ntb should use it.
> >
> >
> > Can we implement NTB as a transport for virtio, then there's no need for
> > virtio_mdev?
>
> Yes, we could have NTB specific virtio_config_ops. Where exactly should
> virtio_mdev be used?
> >
> >
> >> 3) virtio_pci_common: This should be used when a PCIe EPF is connected. This
> >>     should be modified to create virtio_mdev instead of directly creating virtio
> >>     device.
> >> 4) virtio_ntb: This is used for NTB where one end of the link should use
> >>     virtio_ntb. This should create virtio_mdev.
> >>
> >> With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally
> >> any virtio drivers can be used for EP or NTB (In the block diagram above
> >> virtio_net and virtio_rpmsg can be used).
> >>
> >> HOST SIDE (A):
> >> ===============
> >> In the virtualization terminology, the side labeled (A) will be the host side.
> >> Here it will be the place where PCIe device (Endpoint) side SW will execute.
> >>
> >> Bits and pieces of (A) should exist but there should be considerable work in
> >> this.
> >> 1) vringh_net: There should be vringh drivers corresponding to
> >>     the virtio drivers on the guest side (B). vringh_net should register with
> >>     the net core. The vringh_net device should be created by vringh_mdev. This
> >>     should be new development.
> >> 2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The
> >>     vringh_rpmsg device should be created by vringh_mdev.
> >> 3) vringh_mdev: This layer should define ops specific to vringh (e.g
> >>     get_desc_addr() should give vring descriptor address and will depend on
> >>     either EP device or NTB device). I haven't looked further on what other ops
> >>     will be needed. IMO this layer should also decide whether _kern() or _user()
> >>     vringh helpers should be invoked.
> >
> >
> > Right, but probably not necessary called "mdev", it could just some abstraction
> > as a set of callbacks.
>
> Yeah, we could have something like vringh_config_ops. Once we start to
> implement, this might get more clear.
> >
> >
> >> 4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to
> >>     get the vring descriptor address.
> >> 5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead
> >>     of EPF device.
> >>
> >> Jason,
> >>
> >> Can you give your comments on the above design? Do you see any flaws/issues
> >> with the above approach?
> >
> >
> > Looks good overall, see questions above.
>
> Thanks for your comments Jason.
>
> Haotian, Alan, Me or whoever gets to implement this first, should try to follow
> the above discussed approach.

Kishon,

Thank you, and Jason Wang, for comments and suggestions re: NTB.

My preference is to see Haotian continue his work on this
patch, if and when possible. As for expanding the scope to
support NTB, I personally find it very interesting. I will
keep an eye open for a suitable hardware platform in house
before figuring out if and when it would be possible to do such
work. From your slides, you may get there first since you
seem to have a suitable hardware platform already.

Regards,
Alan

>
> Thanks
> Kishon
>
> >
> > Thanks
> >
> >
> >>
> >> Thanks
> >> Kishon
> >>
> >> [1] -> https://lkml.org/lkml/2019/11/18/261
> >> [2] -> https://lkml.org/lkml/2019/9/26/291
> >> [3] ->
> >> https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf
> >>
> >>>
> >>>> There will be two more main disadvantages probably.
> >>>>
> >>>> Firstly, there will be two layers of overheads. vhost/net.c uses
> >>>> vringh.c to channel data buffers into some struct sockets. This is the
> >>>> first layer of overhead. That the virtual network device will have to
> >>>> use these sockets somehow adds another layer of overhead.
> >>>
> >>> As I said, it doesn't work like vhost and no socket is needed at all.
> >>>
> >>>
> >>>> Secondly, probing, intialization and de-initialization of the virtual
> >>>> network_device are already non-trivial. I'll likely copy this part
> >>>> almost verbatim from virtio_net.c in the end. So in the end, there will
> >>>> be more duplicate code.
> >>>
> >>> It will be a new type of network device instead of virtio, you don't
> >>> need to care any virtio stuffs but vringh in your codes. So it looks to
> >>> me it would be much simpler and compact.
> >>>
> >>> But I'm not saying your method is no way to go, but you should deal with
> >>> lots of other issues like I've replied in the previous mail. What you
> >>> want to achieve is
> >>>
> >>> 1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio
> >>> ring <-> Endpoint (virtio with customized config_ops).
> >>>
> >>> But I suggest is
> >>>
> >>> 2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh
> >>> vring (virtio ring in the Host) <-> network device
> >>>
> >>> The differences is.
> >>> - Complexity: In your proposal, there will be two virtio devices and 4
> >>> virtqueues. It means you need to prepare two sets of features, config
> >>> ops etc. And dealing with inconsistent feature will be a pain. It may
> >>> work for simple case like a virtio-net device with only _F_MAC, but it
> >>> would be hard to be expanded. If we decide to go for vringh, there will
> >>> be a single virtio device and 2 virtqueues. In the endpoint part, it
> >>> will be 2 vringh vring (which is actually point the same virtqueue from
> >>> Host side) and a normal network device. There's no need for dealing with
> >>> inconsistency, since vringh basically sever as a a device
> >>> implementation, the feature negotiation is just between device (network
> >>> device with vringh) and driver (virtito-pci) from the view of Linux
> >>> running on the PCI Host.
> >>> - Maintainability: A third path for dealing virtio ring. We've already
> >>> had vhost and vringh, a third path will add a lot of overhead when
> >>> trying to maintaining them. My proposal will try to reuse vringh,
> >>> there's no need a new path.
> >>> - Layer violation: We want to hide the transport details from the device
> >>> and make virito-net device can be used without modification. But your
> >>> codes try to poke information like virtnet_info. My proposal is to just
> >>> have a new networking device that won't need to care virtio at all. It's
> >>> not that hard as you imagine to have a new type of netdev, I suggest to
> >>> take a look at how caif_virtio is done, it would be helpful.
> >>>
> >>> If you still decide to go with two two virtio device model, you need
> >>> probably:
> >>> - Proving two sets of config and features, and deal with inconsistency
> >>> - Try to reuse the vringh codes
> >>> - Do not refer internal structures from virtio-net.c
> >>>
> >>> But I recommend to take a step of trying vringh method which should be
> >>> much simpler.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>> Thank you for your patience!
> >>>>
> >>>> Best,
> >>>> Haotian
> >




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux