On Tue, Nov 26, 2019 at 1:55 PM Alan Mikhak <alan.mikhak@xxxxxxxxxx> wrote: > > On Tue, Nov 26, 2019 at 4:36 AM Kishon Vijay Abraham I <kishon@xxxxxx> wrote: > > > > Hi Jason, > > > > On 26/11/19 3:28 PM, Jason Wang wrote: > > > > > > On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote: > > >> +Alan, Jon > > >> > > >> Hi Jason, Haotian, Alan, > > >> > > >> On 05/09/19 8:26 AM, Jason Wang wrote: > > >>> On 2019/9/5 上午5:58, Haotian Wang wrote: > > >>>> Hi Jason, > > >>>> > > >>>> I have an additional comment regarding using vring. > > >>>> > > >>>> On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > >>>>> Kind of, in order to address the above limitation, you probably want to > > >>>>> implement a vringh based netdevice and driver. It will work like, > > >>>>> instead of trying to represent a virtio-net device to endpoint, > > >>>>> represent a new type of network device, it uses two vringh ring instead > > >>>>> virtio ring. The vringh ring is usually used to implement the > > >>>>> counterpart of virtio driver. The advantages are obvious: > > >>>>> > > >>>>> - no need to deal with two sets of features, config space etc. > > >>>>> - network specific, from the point of endpoint linux, it's not a virtio > > >>>>> device, no need to care about transport stuffs or embedding internal > > >>>>> virtio-net specific data structures > > >>>>> - reuse the exist codes (vringh) to avoid duplicated bugs, implementing > > >>>>> a virtqueue is kind of challenge > > >>>> With vringh.c, there is no easy way to interface with virtio_net.c. > > >>>> > > >>>> vringh.c is linked with vhost/net.c nicely > > >>> > > >>> Let me clarify, vhost_net doesn't use vringh at all (though there's a > > >>> plan to switch to use vringh). > > >>> > > >>> > > >>>> but again it's not easy to > > >>>> interface vhost/net.c with the network stack of endpoint kernel. The > > >>>> vhost drivers are not designed with the purpose of creating another > > >>>> suite of virtual devices in the host kernel in the first place. If I try > > >>>> to manually write code for this interfacing, it seems that I will do > > >>>> duplicate work that virtio_net.c does. > > >>> > > >>> Let me explain: > > >>> > > >>> - I'm not suggesting to use vhost_net since it can only deal with > > >>> userspace virtio rings. > > >>> - I suggest to introduce netdev that has vringh vring assoticated. > > >>> Vringh was designed to deal with virtio ring located at different types > > >>> of memory. It supports userspace vring and kernel vring currently, but > > >>> it should not be too hard to add support for e.g endpoint device that > > >>> requires DMA or whatever other method to access the vring. So it was by > > >>> design to talk directly with e.g kernel virtio device. > > >>> - In your case, you can read vring address from virtio config space > > >>> through endpoint framework and then create vringh. It's as simple as: > > >>> creating a netdev, read vring address, and initialize vringh. Then you > > >>> can use vringh helper to get iov and build skb etc (similar to caif_virtio). > > >> From the discussions above and from looking at Jason's mdev patches [1], I've > > >> created the block diagram below. > > >> > > >> While this patch (from Haotian) deals with RC<->EP connection, I'd also like > > >> this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC) > > >> [2][3]. > > >> > > >> +-----------------------------------+ +-------------------------------------+ > > >> | | | | > > >> | +------------+ +--------------+ | | +------------+ +--------------+ | > > >> | | vringh_net | | vringh_rpmsg | | | | virtio_net | | virtio_rpmsg | | > > >> | +------------+ +--------------+ | | +------------+ +--------------+ | > > >> | | | | > > >> | +---------------+ | | +---------------+ | > > >> | | vringh_mdev | | | | virtio_mdev | | > > >> | +---------------+ | | +---------------+ | > > >> | | | | > > >> | +------------+ +------------+ | | +-------------------+ +------------+| > > >> | | vringh_epf | | vringh_ntb | | | | virtio_pci_common | | virtio_ntb || > > >> | +------------+ +------------+ | | +-------------------+ +------------+| > > >> | (PCI EP Device) (NTB Secondary | | (PCI RC) (NTB Primary | > > >> | Device) | | Device) | > > >> | | | | > > >> | | | | > > >> | (A) | | (B) | > > >> +-----------------------------------+ +-------------------------------------+ > > >> > > >> GUEST SIDE (B): > > >> =============== > > >> In the virtualization terminology, the side labeled (B) will be the guest side. > > >> Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB > > >> for this discussion since PCIe host side SW will execute on both ends of the > > >> link in the case of NTB. However I've included in the block diagram since the > > >> design we adopt should be able to be extended for NTB as well). > > >> > > >> Most of the pieces in (B) already exists. > > >> 1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it > > >> is. > > >> 2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP > > >> usecases as well. Jason has created mvnet based on virtio_mdev, but for EP > > >> usecases virtio_pci_common and virtio_ntb should use it. > > > > > > > > > Can we implement NTB as a transport for virtio, then there's no need for > > > virtio_mdev? > > > > Yes, we could have NTB specific virtio_config_ops. Where exactly should > > virtio_mdev be used? > > > > > > > > >> 3) virtio_pci_common: This should be used when a PCIe EPF is connected. This > > >> should be modified to create virtio_mdev instead of directly creating virtio > > >> device. > > >> 4) virtio_ntb: This is used for NTB where one end of the link should use > > >> virtio_ntb. This should create virtio_mdev. > > >> > > >> With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally > > >> any virtio drivers can be used for EP or NTB (In the block diagram above > > >> virtio_net and virtio_rpmsg can be used). > > >> > > >> HOST SIDE (A): > > >> =============== > > >> In the virtualization terminology, the side labeled (A) will be the host side. > > >> Here it will be the place where PCIe device (Endpoint) side SW will execute. > > >> > > >> Bits and pieces of (A) should exist but there should be considerable work in > > >> this. > > >> 1) vringh_net: There should be vringh drivers corresponding to > > >> the virtio drivers on the guest side (B). vringh_net should register with > > >> the net core. The vringh_net device should be created by vringh_mdev. This > > >> should be new development. > > >> 2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The > > >> vringh_rpmsg device should be created by vringh_mdev. > > >> 3) vringh_mdev: This layer should define ops specific to vringh (e.g > > >> get_desc_addr() should give vring descriptor address and will depend on > > >> either EP device or NTB device). I haven't looked further on what other ops > > >> will be needed. IMO this layer should also decide whether _kern() or _user() > > >> vringh helpers should be invoked. > > > > > > > > > Right, but probably not necessary called "mdev", it could just some abstraction > > > as a set of callbacks. > > > > Yeah, we could have something like vringh_config_ops. Once we start to > > implement, this might get more clear. > > > > > > > > >> 4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to > > >> get the vring descriptor address. > > >> 5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead > > >> of EPF device. > > >> > > >> Jason, > > >> > > >> Can you give your comments on the above design? Do you see any flaws/issues > > >> with the above approach? > > > > > > > > > Looks good overall, see questions above. > > > > Thanks for your comments Jason. > > > > Haotian, Alan, Me or whoever gets to implement this first, should try to follow > > the above discussed approach. > > Kishon, > > Thank you, and Jason Wang, for comments and suggestions re: NTB. > > My preference is to see Haotian continue his work on this > patch, if and when possible. As for expanding the scope to > support NTB, I personally find it very interesting. I will > keep an eye open for a suitable hardware platform in house > before figuring out if and when it would be possible to do such > work. From your slides, you may get there first since you > seem to have a suitable hardware platform already. - haotian.wang@xxxxxxxxxx other: haotian.wang@xxxxxxxx > > Regards, > Alan > > > > > Thanks > > Kishon > > > > > > > > Thanks > > > > > > > > >> > > >> Thanks > > >> Kishon > > >> > > >> [1] -> https://lkml.org/lkml/2019/11/18/261 > > >> [2] -> https://lkml.org/lkml/2019/9/26/291 > > >> [3] -> > > >> https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf > > >> > > >>> > > >>>> There will be two more main disadvantages probably. > > >>>> > > >>>> Firstly, there will be two layers of overheads. vhost/net.c uses > > >>>> vringh.c to channel data buffers into some struct sockets. This is the > > >>>> first layer of overhead. That the virtual network device will have to > > >>>> use these sockets somehow adds another layer of overhead. > > >>> > > >>> As I said, it doesn't work like vhost and no socket is needed at all. > > >>> > > >>> > > >>>> Secondly, probing, intialization and de-initialization of the virtual > > >>>> network_device are already non-trivial. I'll likely copy this part > > >>>> almost verbatim from virtio_net.c in the end. So in the end, there will > > >>>> be more duplicate code. > > >>> > > >>> It will be a new type of network device instead of virtio, you don't > > >>> need to care any virtio stuffs but vringh in your codes. So it looks to > > >>> me it would be much simpler and compact. > > >>> > > >>> But I'm not saying your method is no way to go, but you should deal with > > >>> lots of other issues like I've replied in the previous mail. What you > > >>> want to achieve is > > >>> > > >>> 1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio > > >>> ring <-> Endpoint (virtio with customized config_ops). > > >>> > > >>> But I suggest is > > >>> > > >>> 2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh > > >>> vring (virtio ring in the Host) <-> network device > > >>> > > >>> The differences is. > > >>> - Complexity: In your proposal, there will be two virtio devices and 4 > > >>> virtqueues. It means you need to prepare two sets of features, config > > >>> ops etc. And dealing with inconsistent feature will be a pain. It may > > >>> work for simple case like a virtio-net device with only _F_MAC, but it > > >>> would be hard to be expanded. If we decide to go for vringh, there will > > >>> be a single virtio device and 2 virtqueues. In the endpoint part, it > > >>> will be 2 vringh vring (which is actually point the same virtqueue from > > >>> Host side) and a normal network device. There's no need for dealing with > > >>> inconsistency, since vringh basically sever as a a device > > >>> implementation, the feature negotiation is just between device (network > > >>> device with vringh) and driver (virtito-pci) from the view of Linux > > >>> running on the PCI Host. > > >>> - Maintainability: A third path for dealing virtio ring. We've already > > >>> had vhost and vringh, a third path will add a lot of overhead when > > >>> trying to maintaining them. My proposal will try to reuse vringh, > > >>> there's no need a new path. > > >>> - Layer violation: We want to hide the transport details from the device > > >>> and make virito-net device can be used without modification. But your > > >>> codes try to poke information like virtnet_info. My proposal is to just > > >>> have a new networking device that won't need to care virtio at all. It's > > >>> not that hard as you imagine to have a new type of netdev, I suggest to > > >>> take a look at how caif_virtio is done, it would be helpful. > > >>> > > >>> If you still decide to go with two two virtio device model, you need > > >>> probably: > > >>> - Proving two sets of config and features, and deal with inconsistency > > >>> - Try to reuse the vringh codes > > >>> - Do not refer internal structures from virtio-net.c > > >>> > > >>> But I recommend to take a step of trying vringh method which should be > > >>> much simpler. > > >>> > > >>> Thanks > > >>> > > >>> > > >>>> Thank you for your patience! > > >>>> > > >>>> Best, > > >>>> Haotian > > >