Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Cornelia Huck <cohuck@xxxxxxxxxx> · Fri, 28 Aug 2020 12:34:09 +0200

On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang <jasowang@xxxxxxxxxx> wrote:

[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]

> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> > Hi Jason,
> >
> > On 7/8/2020 4:52 PM, Jason Wang wrote:  
> >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:  
> >>> Hi Jason,
> >>>
> >>> On 7/7/2020 3:17 PM, Jason Wang wrote:  
> >>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:  
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:  
> >>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:  
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:  
> >>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:  
> >>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:  
> >>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>
> >>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>>         rpmsg communication between two SoCs connected to each other
> >>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>>         between two SoCs connected via NTB
> >>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>
> >>>>>>>>>> UseCase1 :
> >>>>>>>>>>
> >>>>>>>>>>       VHOST RPMSG                     VIRTIO RPMSG
> >>>>>>>>>>            +                               +
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>> +-----v------+                 +------v-------+
> >>>>>>>>>> |   Linux    |                 |     Linux    |
> >>>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>>> |            <----------------->              |
> >>>>>>>>>> |            |                 |              |
> >>>>>>>>>> |    SOC1    |                 |     SOC2     |
> >>>>>>>>>> +------------+                 +--------------+
> >>>>>>>>>>
> >>>>>>>>>> UseCase 2:
> >>>>>>>>>>
> >>>>>>>>>>           VHOST RPMSG                                      VIRTIO RPMSG
> >>>>>>>>>>                +                                                 +
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>         +------v------+                                   +------v------+
> >>>>>>>>>>         |             |                                   |             |
> >>>>>>>>>>         |    HOST1    |                                   |    HOST2    |
> >>>>>>>>>>         |             |                                   |             |
> >>>>>>>>>>         +------^------+                                   +------^------+
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>> |  +------v------+                                   +------v------+  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |     EP      |                                   |     EP      |  |
> >>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> >>>>>>>>>> |  |             <----------------------------------->             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
> >>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
> >>>>>>>>>> |  +-------------+                                   +-------------+  |
> >>>>>>>>>> +---------------------------------------------------------------------+

First of all, to clarify the terminology:
Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
virtqueues + the exiting vhost interfaces?

> >>>>>>>>>>
> >>>>>>>>>> Software Layering:
> >>>>>>>>>>
> >>>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>          +----------------+  +-----------+  +------------+  +----------+
> >>>>>>>>>>          |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >>>>>>>>>>          +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>>> |                            VHOST CORE                                |
> >>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> >>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> >>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+

So, the upper half is basically various functionality types, e.g. a net
device. What is the lower half, a hardware interface? Would it be
equivalent to e.g. a normal PCI device?

> >>>>>>>>>>
> >>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>
> >>>>>>>>>> [1] ->
> >>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@xxxxxx  
> >>>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>>> to review, but I certainly plan to do that. Thanks!  
> >>>>>>>> Yes, it would be better if there's a git branch for us to have a look.  
> >>>>>>> I've pushed the branch
> >>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc  
> >>>>>> Thanks
> >>>>>>
> >>>>>>  
> >>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>>> work is
> >>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).  
> >>>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>>> doesn't necessarily involve virtualization.  
> >>>>>> Right, this is something similar to VOP
> >>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>>> hardware I guess and VOP use userspace application to implement the device.  
> >>>>> I'd also like to point out, this series tries to have communication between
> >>>>> two
> >>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>>> any
> >>>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>>
> >>>>> #ls drivers/ntb/hw/
> >>>>> amd  epf  idt  intel  mscc
> >>>>>
> >>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>>> use virtio-vhost communication
> >>>>>
> >>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c  
> >>>> Thanks for those backgrounds.
> >>>>
> >>>>  
> >>>>>>>      So there is no guest or host as in
> >>>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>>> one
> >>>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>>> One is
> >>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>>> probably intermediate entity in virtualization?)  
> >>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>>> guess "vringh" could be better.  
> >>>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>>> virtqueue and can be used either in frontend or backend.  

I guess that clears up at least some of my questions from above...

> >>>> Ok.
> >>>>
> >>>>  
> >>>>>>>> Have you considered to implement these through vDPA?  
> >>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>>> driver
> >>>>>>> or vhost net driver is not provided.
> >>>>>>>
> >>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>>> (usecase2 above),  
> >>>>>> I see.
> >>>>>>
> >>>>>>  
> >>>>>>>      all the boards run Linux. The middle board provides NTB
> >>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>> functionality and
> >>>>>>> transfer data using rpmsg. 

This setup looks really interesting (sometimes, it's really hard to
imagine this in the abstract.)

> >>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>> driver.  
> >>>>> IMHO we'll need two buses one for frontend and other for
> >>>>> backend because the two components can then co-operate/interact
> >>>>> with each other to provide a functionality. Though both will
> >>>>> seemingly provide similar callbacks, they are both provide
> >>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>> or identical.
> >>>>>
> >>>>> Having the same bus can also create sequencing issues.
> >>>>>
> >>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>
> >>>>> device_features = dev->config->get_features(dev);
> >>>>>
> >>>>> Now if we use same bus for both front-end and back-end, both
> >>>>> will try to get_features when there has been no set_features.
> >>>>> Ideally vhost device should be initialized first with the set
> >>>>> of features it supports. Vhost and virtio should use "status"
> >>>>> and "features" complimentarily and not identically.  
> >>>> Yes, but there's no need for doing status/features passthrough
> >>>> in epf vhost drivers.b
> >>>>
> >>>>  
> >>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>> device (or backend) gets initialized with data such as
> >>>>> features. Similarly vhost (backend)
> >>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>> for virtio as the physical memory for virtqueues are created by
> >>>>> virtio (frontend).  
> >>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>> device and virtio device (which is a mediated device). The
> >>>> vhost(vringh) device is doing feature negotiation with the
> >>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>> feature negotiation with local virtio drivers. If there're
> >>>> feature mismatch, epf vhost drivers and do mediation between
> >>>> them.  
> >>> Here epf vhost should be initialized with a set of features for
> >>> it to negotiate either as vhost device or virtio device no? Where
> >>> should the initial feature set for epf vhost come from?  
> >>
> >> I think it can work as:
> >>
> >> 1) Having an initial features (hard coded in the code) set X in
> >> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >> device 3) local virtio driver will negotiate with virtio device
> >> with feature set Y 4) remote virtio driver will negotiate with
> >> vringh device with feature set Z 5) mediate between feature Y and
> >> feature Z since both Y and Z are a subset of X
> >>
> >>  
> > okay. I'm also thinking if we could have configfs for configuring
> > this. Anyways we could find different approaches of configuring
> > this.  
> 
> 
> Yes, and I think some management API is needed even in the design of 
> your "Software Layering". In that figure, rpmsg vhost need some
> pre-set or hard-coded features.

When I saw the plumbers talk, my first idea was "this needs to be a new
transport". You have some hard-coded or pre-configured features, and
then features are negotiated via a transport-specific means in the
usual way. There's basically an extra/extended layer for this (and
status, and whatever).

Does that make any sense?

> 
> 
> >>>>>> It will have virtqueues but only used for the communication
> >>>>>> between itself and
> >>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>> do datacopy between virtqueue and
> >>>>>> vringh queues.
> >>>>>>
> >>>>>> It works like:
> >>>>>>
> >>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>> vringh queue/epf>  
> >>>>>>
> >>>>>> The advantages is that there's no need for writing new buses
> >>>>>> and drivers.  
> >>>>> I think this will work however there is an addtional copy
> >>>>> between vringh queue and virtqueue,  
> >>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>> will have:
> >>>>
> >>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)  
> >>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>> vringh?  
> >>
> >> Yes.
> >>
> >>  
> >>> And virtio
> >>> ring(2) is created by virtio pci (RC).  
> >>
> >> Yes.
> >>
> >>  
> >>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>> the buffer len and addr and them DMA to Linux(RC)?  
> >>> okay, I made some optimization here where vhost-rpmsg using a
> >>> helper writes a buffer from rpmsg's upper layer directly to
> >>> remote Linux (RC) as against here were it has to be first written
> >>> to virtio ring (1).
> >>>
> >>> Thinking how this would look for NTB
> >>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>
> >>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?  
> >>
> >> Yes, I think so it needs to use vring to access virtio ring (1) as
> >> well.  
> > NTB(HOST1) and virtio ring(1) will be in the same system. So it
> > doesn't have to use vring. virtio ring(1) is by the virtio device
> > the NTB(HOST1) creates.  
> 
> 
> Right.
> 
> 
> >>  
> >>> Do you also think this will work seamlessly with virtio_net.c,
> >>> virtio_blk.c?  
> >>
> >> Yes.  
> > okay, I haven't looked at this but the backend of virtio_blk should
> > access an actual storage device no?  
> 
> 
> Good point, for non-peer device like storage. There's probably no
> need for it to be registered on the virtio bus and it might be better
> to behave as you proposed.

I might be missing something; but if you expose something as a block
device, it should have something it can access with block reads/writes,
shouldn't it? Of course, that can be a variety of things.

> 
> Just to make sure I understand the design, how is VHOST SCSI expected
> to work in your proposal, does it have a device for file as a backend?
> 
> 
> >>  
> >>> I'd like to get clarity on two things in the approach you
> >>> suggested, one is features (since epf vhost should ideally be
> >>> transparent to any virtio driver)  
> >>
> >> We can have have an array of pre-defined features indexed by
> >> virtio device id in the code.
> >>
> >>  
> >>> and the other is how certain inputs to virtio device such as
> >>> number of buffers be determined.  
> >>
> >> We can start from hard coded the value like 256, or introduce some
> >> API for user to change the value.
> >>
> >>  
> >>> Thanks again for your suggestions!  
> >>
> >> You're welcome.
> >>
> >> Note that I just want to check whether or not we can reuse the
> >> virtio bus/driver. It's something similar to what you proposed in
> >> Software Layering but we just replace "vhost core" with "virtio
> >> bus" and move the vhost core below epf/ntb/platform transport.  
> > Got it. My initial design was based on my understanding of your
> > comments [1].  
> 
> 
> Yes, but that's just for a networking device. If we want something
> more generic, it may require more thought (bus etc).

I believe that we indeed need something bus-like to be able to support
a variety of devices.

> 
> 
> >
> > I'll try to create something based on your proposed design here.  
> 
> 
> Sure, but for coding, we'd better wait for other's opinion here.

Please tell me if my thoughts above make any sense... I have just
started looking at that, so I might be completely off.