Hi Jason, On 15/09/20 1:48 pm, Jason Wang wrote: > Hi Kishon: > > On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote: >>> Then you need something that is functional equivalent to virtio PCI >>> which is actually the concept of vDPA (e.g vDPA provides alternatives if >>> the queue_sel is hard in the EP implementation). >> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct >> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for >> the VHOST driver to configure VHOST device). >> >> struct vdpa_config_ops { >> /* Virtqueue ops */ >> int (*set_vq_address)(struct vdpa_device *vdev, >> u16 idx, u64 desc_area, u64 driver_area, >> u64 device_area); >> void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num); >> void (*kick_vq)(struct vdpa_device *vdev, u16 idx); >> void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx, >> struct vdpa_callback *cb); >> void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready); >> bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx); >> int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, >> const struct vdpa_vq_state *state); >> int (*get_vq_state)(struct vdpa_device *vdev, u16 idx, >> struct vdpa_vq_state *state); >> struct vdpa_notification_area >> (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); >> /* vq irq is not expected to be changed once DRIVER_OK is set */ >> int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx); >> >> /* Device ops */ >> u32 (*get_vq_align)(struct vdpa_device *vdev); >> u64 (*get_features)(struct vdpa_device *vdev); >> int (*set_features)(struct vdpa_device *vdev, u64 features); >> void (*set_config_cb)(struct vdpa_device *vdev, >> struct vdpa_callback *cb); >> u16 (*get_vq_num_max)(struct vdpa_device *vdev); >> u32 (*get_device_id)(struct vdpa_device *vdev); >> u32 (*get_vendor_id)(struct vdpa_device *vdev); >> u8 (*get_status)(struct vdpa_device *vdev); >> void (*set_status)(struct vdpa_device *vdev, u8 status); >> void (*get_config)(struct vdpa_device *vdev, unsigned int offset, >> void *buf, unsigned int len); >> void (*set_config)(struct vdpa_device *vdev, unsigned int offset, >> const void *buf, unsigned int len); >> u32 (*get_generation)(struct vdpa_device *vdev); >> >> /* DMA ops */ >> int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); >> int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, >> u64 pa, u32 perm); >> int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size); >> >> /* Free device resources */ >> void (*free)(struct vdpa_device *vdev); >> }; >> >> +struct vhost_config_ops { >> + int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs, >> + unsigned int num_bufs, struct vhost_virtqueue *vqs[], >> + vhost_vq_callback_t *callbacks[], >> + const char * const names[]); >> + void (*del_vqs)(struct vhost_dev *vdev); >> + int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, >> int len); >> + int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int >> len); >> + int (*set_features)(struct vhost_dev *vdev, u64 device_features); >> + int (*set_status)(struct vhost_dev *vdev, u8 status); >> + u8 (*get_status)(struct vhost_dev *vdev); >> +}; >> + >> struct virtio_config_ops >> I think there's some overlap here and some of the ops tries to do the >> same thing. >> >> I think it differs in (*set_vq_address)() and (*create_vqs)(). >> [create_vqs() introduced in struct vhost_config_ops provides >> complimentary functionality to (*find_vqs)() in struct >> virtio_config_ops. It seemingly encapsulates the functionality of >> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..]. >> >> Back to the difference between (*set_vq_address)() and (*create_vqs)(), >> set_vq_address() directly provides the virtqueue address to the vdpa >> device but create_vqs() only provides the parameters of the virtqueue >> (like the number of virtqueues, number of buffers) but does not directly >> provide the address. IMO the backend client drivers (like net or vhost) >> shouldn't/cannot by itself know how to access the vring created on >> virtio front-end. The vdpa device/vhost device should have logic for >> that. That will help the client drivers to work with different types of >> vdpa device/vhost device and can access the vring created by virtio >> irrespective of whether the vring can be accessed via mmio or kernel >> space or user space. >> >> I think vdpa always works with client drivers in userspace and providing >> userspace address for vring. > > > Sorry for being unclear. What I meant is not replacing vDPA with the > vhost(bus) you proposed but the possibility of replacing virtio-pci-epf > with vDPA in: Okay, so the virtio back-end still use vhost and front end should use vDPA. I see. So the host side PCI driver for EPF should populate vdpa_config_ops and invoke vdpa_register_device(). > > My question is basically for the part of virtio_pci_epf_send_command(), > so it looks to me you have a vendor specific API to replace the > virtio-pci layout of the BAR: Even when we use vDPA, we have to use some sort of virtio_pci_epf_send_command() to communicate with virtio backend right? Right, the layout is slightly different from the standard layout. This is the layout struct epf_vhost_reg_queue { u8 cmd; u8 cmd_status; u16 status; u16 num_buffers; u16 msix_vector; u64 queue_addr; } __packed; struct epf_vhost_reg { u64 host_features; u64 guest_features; u16 msix_config; u16 num_queues; u8 device_status; u8 config_generation; u32 isr; u8 cmd; u8 cmd_status; struct epf_vhost_reg_queue vq[MAX_VQS]; } __packed; > > > +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev, > + u32 command) > +{ > + struct virtio_pci_epf *pci_epf; > + void __iomem *ioaddr; > + ktime_t timeout; > + bool timedout; > + int ret = 0; > + u8 status; > + > + pci_epf = to_virtio_pci_epf(vp_dev); > + ioaddr = vp_dev->ioaddr; > + > + mutex_lock(&pci_epf->lock); > + writeb(command, ioaddr + HOST_CMD); > + timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT); > + while (1) { > + timedout = ktime_after(ktime_get(), timeout); > + status = readb(ioaddr + HOST_CMD_STATUS); > + > > Several questions: > > - It's not clear to me how the synchronization is done between the RC > and EP. E.g how and when the value of HOST_CMD_STATUS can be changed. The HOST_CMD (commands sent to the EP) is serialized by using mutex. Once the EP reads the command, it resets the value in HOST_CMD. So HOST_CMD is less likely an issue. A sufficiently large time is given for the EP to complete it's operation (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could be case where EP updates HOST_CMD_STATUS after RC writes HOST_CMD_STATUS_NONE, but by then HOST has already detected this as failure and error-ed out. > If you still want to introduce a new transport, a virtio spec patch > would be helpful for us to understand the device API. Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git? > - You have you vendor specific layout (according to > virtio_pci_epb_table()), so I guess you it's better to have a vendor > specific vDPA driver instead Okay, with vDPA, we are free to define our own layouts. > - The advantage of vendor specific vDPA driver is that it can 1) have > less codes 2) support userspace drivers through vhost-vDPA (instead of > inventing new APIs since we can't use vfio-pci here). I see there's an additional level of indirection from virtio to vDPA and probably no need for spec update but don't exactly see how it'll reduce code. For 2, Isn't vhost-vdpa supposed to run on virtio backend? >From a high level, I think I should be able to use vDPA for virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c? ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO functionality). Thanks Kishon