On Thu, 9 Jul 2020 14:26:53 +0800 Jason Wang <jasowang@xxxxxxxxxx> wrote: [Let me note right at the beginning that I first noted this while listening to Kishon's talk at LPC on Wednesday. I might be very confused about the background here, so let me apologize beforehand for any confusion I might spread.] > On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote: > > Hi Jason, > > > > On 7/8/2020 4:52 PM, Jason Wang wrote: > >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote: > >>> Hi Jason, > >>> > >>> On 7/7/2020 3:17 PM, Jason Wang wrote: > >>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote: > >>>>> Hi Jason, > >>>>> > >>>>> On 7/3/2020 12:46 PM, Jason Wang wrote: > >>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote: > >>>>>>> Hi Jason, > >>>>>>> > >>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote: > >>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote: > >>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote: > >>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC > >>>>>>>>>> communication over MMIO. This series enables rpmsg communication between > >>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2 > >>>>>>>>>> > >>>>>>>>>> 1) Modify vhost to use standard Linux driver model > >>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO > >>>>>>>>>> 3) Add vhost client driver for rpmsg > >>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for > >>>>>>>>>> rpmsg communication between two SoCs connected to each other > >>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication > >>>>>>>>>> between two SoCs connected via NTB > >>>>>>>>>> 6) Add configfs to configure the components > >>>>>>>>>> > >>>>>>>>>> UseCase1 : > >>>>>>>>>> > >>>>>>>>>> VHOST RPMSG VIRTIO RPMSG > >>>>>>>>>> + + > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> +-----v------+ +------v-------+ > >>>>>>>>>> | Linux | | Linux | > >>>>>>>>>> | Endpoint | | Root Complex | > >>>>>>>>>> | <-----------------> | > >>>>>>>>>> | | | | > >>>>>>>>>> | SOC1 | | SOC2 | > >>>>>>>>>> +------------+ +--------------+ > >>>>>>>>>> > >>>>>>>>>> UseCase 2: > >>>>>>>>>> > >>>>>>>>>> VHOST RPMSG VIRTIO RPMSG > >>>>>>>>>> + + > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> +------v------+ +------v------+ > >>>>>>>>>> | | | | > >>>>>>>>>> | HOST1 | | HOST2 | > >>>>>>>>>> | | | | > >>>>>>>>>> +------^------+ +------^------+ > >>>>>>>>>> | | > >>>>>>>>>> | | > >>>>>>>>>> +---------------------------------------------------------------------+ > >>>>>>>>>> | +------v------+ +------v------+ | > >>>>>>>>>> | | | | | | > >>>>>>>>>> | | EP | | EP | | > >>>>>>>>>> | | CONTROLLER1 | | CONTROLLER2 | | > >>>>>>>>>> | | <-----------------------------------> | | > >>>>>>>>>> | | | | | | > >>>>>>>>>> | | | | | | > >>>>>>>>>> | | | SoC With Multiple EP Instances | | | > >>>>>>>>>> | | | (Configured using NTB Function) | | | > >>>>>>>>>> | +-------------+ +-------------+ | > >>>>>>>>>> +---------------------------------------------------------------------+ First of all, to clarify the terminology: Is "vhost rpmsg" acting as what the virtio standard calls the 'device', and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just virtqueues + the exiting vhost interfaces? > >>>>>>>>>> > >>>>>>>>>> Software Layering: > >>>>>>>>>> > >>>>>>>>>> The high-level SW layering should look something like below. This series > >>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be > >>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform > >>>>>>>>>> device, user) can use any of the vhost client driver. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> +----------------+ +-----------+ +------------+ +----------+ > >>>>>>>>>> | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X | > >>>>>>>>>> +-------^--------+ +-----^-----+ +-----^------+ +----^-----+ > >>>>>>>>>> | | | | > >>>>>>>>>> | | | | > >>>>>>>>>> | | | | > >>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+ > >>>>>>>>>> | VHOST CORE | > >>>>>>>>>> +--------^---------------^--------------------^------------------^-----+ > >>>>>>>>>> | | | | > >>>>>>>>>> | | | | > >>>>>>>>>> | | | | > >>>>>>>>>> +--------v-------+ +----v------+ +----------v----------+ +----v-----+ > >>>>>>>>>> | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X | > >>>>>>>>>> +----------------+ +-----------+ +---------------------+ +----------+ So, the upper half is basically various functionality types, e.g. a net device. What is the lower half, a hardware interface? Would it be equivalent to e.g. a normal PCI device? > >>>>>>>>>> > >>>>>>>>>> This was initially proposed here [1] > >>>>>>>>>> > >>>>>>>>>> [1] -> > >>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@xxxxxx > >>>>>>>>> I find this very interesting. A huge patchset so will take a bit > >>>>>>>>> to review, but I certainly plan to do that. Thanks! > >>>>>>>> Yes, it would be better if there's a git branch for us to have a look. > >>>>>>> I've pushed the branch > >>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc > >>>>>> Thanks > >>>>>> > >>>>>> > >>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the > >>>>>>>> work is > >>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus). > >>>>>>> This is about connecting two different HW systems both running Linux and > >>>>>>> doesn't necessarily involve virtualization. > >>>>>> Right, this is something similar to VOP > >>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the > >>>>>> hardware I guess and VOP use userspace application to implement the device. > >>>>> I'd also like to point out, this series tries to have communication between > >>>>> two > >>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe > >>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and > >>>>> any > >>>>> of the HW in NTB below should be able to use a virtio-vhost communication > >>>>> > >>>>> #ls drivers/ntb/hw/ > >>>>> amd epf idt intel mscc > >>>>> > >>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint > >>>>> function driver and hence any SoC that supports configurable PCIe endpoint can > >>>>> use virtio-vhost communication > >>>>> > >>>>> # ls drivers/pci/controller/dwc/*ep* > >>>>> drivers/pci/controller/dwc/pcie-designware-ep.c > >>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c > >>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c > >>>> Thanks for those backgrounds. > >>>> > >>>> > >>>>>>> So there is no guest or host as in > >>>>>>> virtualization but two entirely different systems connected via PCIe cable, > >>>>>>> one > >>>>>>> acting as guest and one as host. So one system will provide virtio > >>>>>>> functionality reserving memory for virtqueues and the other provides vhost > >>>>>>> functionality providing a way to access the virtqueues in virtio memory. > >>>>>>> One is > >>>>>>> source and the other is sink and there is no intermediate entity. (vhost was > >>>>>>> probably intermediate entity in virtualization?) > >>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for > >>>>>> me since it was use for implementing virtio backend for userspace drivers. I > >>>>>> guess "vringh" could be better. > >>>>> Initially I had named this vringh but later decided to choose vhost instead of > >>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it > >>>>> now resides in an entirely different system. Whatever virtio is for a frontend > >>>>> system, vhost can be that for a backend system. vring can be for accessing > >>>>> virtqueue and can be used either in frontend or backend. I guess that clears up at least some of my questions from above... > >>>> Ok. > >>>> > >>>> > >>>>>>>> Have you considered to implement these through vDPA? > >>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg > >>>>>>> driver > >>>>>>> or vhost net driver is not provided. > >>>>>>> > >>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg > >>>>>>> (usecase2 above), > >>>>>> I see. > >>>>>> > >>>>>> > >>>>>>> all the boards run Linux. The middle board provides NTB > >>>>>>> functionality and board on either side provides virtio/vhost > >>>>>>> functionality and > >>>>>>> transfer data using rpmsg. This setup looks really interesting (sometimes, it's really hard to imagine this in the abstract.) > >>>>>> So I wonder whether it's worthwhile for a new bus. Can we use > >>>>>> the existed virtio-bus/drivers? It might work as, except for > >>>>>> the epf transport, we can introduce a epf "vhost" transport > >>>>>> driver. > >>>>> IMHO we'll need two buses one for frontend and other for > >>>>> backend because the two components can then co-operate/interact > >>>>> with each other to provide a functionality. Though both will > >>>>> seemingly provide similar callbacks, they are both provide > >>>>> symmetrical or complimentary funcitonality and need not be same > >>>>> or identical. > >>>>> > >>>>> Having the same bus can also create sequencing issues. > >>>>> > >>>>> If you look at virtio_dev_probe() of virtio_bus > >>>>> > >>>>> device_features = dev->config->get_features(dev); > >>>>> > >>>>> Now if we use same bus for both front-end and back-end, both > >>>>> will try to get_features when there has been no set_features. > >>>>> Ideally vhost device should be initialized first with the set > >>>>> of features it supports. Vhost and virtio should use "status" > >>>>> and "features" complimentarily and not identically. > >>>> Yes, but there's no need for doing status/features passthrough > >>>> in epf vhost drivers.b > >>>> > >>>> > >>>>> virtio device (or frontend) cannot be initialized before vhost > >>>>> device (or backend) gets initialized with data such as > >>>>> features. Similarly vhost (backend) > >>>>> cannot access virqueues or buffers before virtio (frontend) sets > >>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there > >>>>> for virtio as the physical memory for virtqueues are created by > >>>>> virtio (frontend). > >>>> epf vhost drivers need to implement two devices: vhost(vringh) > >>>> device and virtio device (which is a mediated device). The > >>>> vhost(vringh) device is doing feature negotiation with the > >>>> virtio device via RC/EP or NTB. The virtio device is doing > >>>> feature negotiation with local virtio drivers. If there're > >>>> feature mismatch, epf vhost drivers and do mediation between > >>>> them. > >>> Here epf vhost should be initialized with a set of features for > >>> it to negotiate either as vhost device or virtio device no? Where > >>> should the initial feature set for epf vhost come from? > >> > >> I think it can work as: > >> > >> 1) Having an initial features (hard coded in the code) set X in > >> epf vhost 2) Using this X for both virtio device and vhost(vringh) > >> device 3) local virtio driver will negotiate with virtio device > >> with feature set Y 4) remote virtio driver will negotiate with > >> vringh device with feature set Z 5) mediate between feature Y and > >> feature Z since both Y and Z are a subset of X > >> > >> > > okay. I'm also thinking if we could have configfs for configuring > > this. Anyways we could find different approaches of configuring > > this. > > > Yes, and I think some management API is needed even in the design of > your "Software Layering". In that figure, rpmsg vhost need some > pre-set or hard-coded features. When I saw the plumbers talk, my first idea was "this needs to be a new transport". You have some hard-coded or pre-configured features, and then features are negotiated via a transport-specific means in the usual way. There's basically an extra/extended layer for this (and status, and whatever). Does that make any sense? > > > >>>>>> It will have virtqueues but only used for the communication > >>>>>> between itself and > >>>>>> uppter virtio driver. And it will have vringh queues which > >>>>>> will be probe by virtio epf transport drivers. And it needs to > >>>>>> do datacopy between virtqueue and > >>>>>> vringh queues. > >>>>>> > >>>>>> It works like: > >>>>>> > >>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- > >>>>>> vringh queue/epf> > >>>>>> > >>>>>> The advantages is that there's no need for writing new buses > >>>>>> and drivers. > >>>>> I think this will work however there is an addtional copy > >>>>> between vringh queue and virtqueue, > >>>> I think not? E.g in use case 1), if we stick to virtio bus, we > >>>> will have: > >>>> > >>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- > >>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC) > >>> IIUC epf vhost driver (EP) will access virtio ring(2) using > >>> vringh? > >> > >> Yes. > >> > >> > >>> And virtio > >>> ring(2) is created by virtio pci (RC). > >> > >> Yes. > >> > >> > >>>> What epf vhost driver did is to read from virtio ring(1) about > >>>> the buffer len and addr and them DMA to Linux(RC)? > >>> okay, I made some optimization here where vhost-rpmsg using a > >>> helper writes a buffer from rpmsg's upper layer directly to > >>> remote Linux (RC) as against here were it has to be first written > >>> to virtio ring (1). > >>> > >>> Thinking how this would look for NTB > >>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> > >>> NTB(HOST2) <- virtio ring(2) -> virtio-rpmsg (HOST2) > >>> > >>> Here the NTB(HOST1) will access the virtio ring(2) using vringh? > >> > >> Yes, I think so it needs to use vring to access virtio ring (1) as > >> well. > > NTB(HOST1) and virtio ring(1) will be in the same system. So it > > doesn't have to use vring. virtio ring(1) is by the virtio device > > the NTB(HOST1) creates. > > > Right. > > > >> > >>> Do you also think this will work seamlessly with virtio_net.c, > >>> virtio_blk.c? > >> > >> Yes. > > okay, I haven't looked at this but the backend of virtio_blk should > > access an actual storage device no? > > > Good point, for non-peer device like storage. There's probably no > need for it to be registered on the virtio bus and it might be better > to behave as you proposed. I might be missing something; but if you expose something as a block device, it should have something it can access with block reads/writes, shouldn't it? Of course, that can be a variety of things. > > Just to make sure I understand the design, how is VHOST SCSI expected > to work in your proposal, does it have a device for file as a backend? > > > >> > >>> I'd like to get clarity on two things in the approach you > >>> suggested, one is features (since epf vhost should ideally be > >>> transparent to any virtio driver) > >> > >> We can have have an array of pre-defined features indexed by > >> virtio device id in the code. > >> > >> > >>> and the other is how certain inputs to virtio device such as > >>> number of buffers be determined. > >> > >> We can start from hard coded the value like 256, or introduce some > >> API for user to change the value. > >> > >> > >>> Thanks again for your suggestions! > >> > >> You're welcome. > >> > >> Note that I just want to check whether or not we can reuse the > >> virtio bus/driver. It's something similar to what you proposed in > >> Software Layering but we just replace "vhost core" with "virtio > >> bus" and move the vhost core below epf/ntb/platform transport. > > Got it. My initial design was based on my understanding of your > > comments [1]. > > > Yes, but that's just for a networking device. If we want something > more generic, it may require more thought (bus etc). I believe that we indeed need something bus-like to be able to support a variety of devices. > > > > > > I'll try to create something based on your proposed design here. > > > Sure, but for coding, we'd better wait for other's opinion here. Please tell me if my thoughts above make any sense... I have just started looking at that, so I might be completely off.