Hi Haotian Wang, On 27/08/19 3:29 AM, Haotian Wang wrote: > Hi Kishon, > > Thank you so much for the reply! > > On Mon, Aug 26, 2019 at 6:51 AM Kishon Vijay Abraham I <kishon@xxxxxx> wrote: >>> This function driver is tested on the following pair of systems. The PCI >>> endpoint is a Xilinx VCU118 board programmed with a SiFive Linux-capable >>> core running Linux 5.2. The PCI host is an x86_64 Intel(R) Core(TM) >>> i3-6100 running unmodified Linux 5.2. The virtual link achieved a >>> stable throughput of ~180KB/s during scp sessions of a 50M file. The >> > > I haven't taken a close look at vhost. Using virtio_device was mainly > because I did not change the code on the PCI host side, therefore using > the same structs as virtio_pci and virtio_ring made it easy to access > data structures on the PCI host from the endpoint. Another reason is > that in this endpoint function, the use case of virtio_device was not > entirely the same as that of kvm/qemu. Instead, this was probably closer > to what veth did, in that it established a connection between a pair of > virtio_devices. So far virtio_device has served the purpose well and I > could reuse a lot of code from virtio. > >> Please add the Documentation as a separate patch. > Should I submit that as a different patch in the same patch series or a > totally different patch? Thanks! You could add that as a different patch in the same patch series. > >>> + CONFIG_VIRTIO >>> + CONFIG_VIRTIO >> >> ^^redundant line. > Will fix. > >>> +CONFIG_PCI_HOST_LITTLE_ENDIAN must be set at COMPILE TIME. Toggle it on to build >>> +the module with the PCI host being in little endianness. >> It would be better if we could get the endianness of the host at runtime. That >> way irrespective of the host endianness we could use the same kernel image in >> endpoint. > There are two ways I can imagine of achieving this. The first is to > change the whole endpoint function into using modern virtio interfaces, > because those specify little endianness to be used in all of __virtio16, > __virtio32 etc. I didn't take that path because the development platform > did not allow me to access some PCI configuration space registers, such > as the vendor-specific capabilities. These were required to configure a > virtio_device representing the PCI host. I would prefer this approach. Do you need any vendor specific capabilities for virtio_device? > > The second way is to add a module parameter for host endianness. The > user has to make sure that module parameter is setup correctly before > this endpoint function calls linkup() though. > >>> +Enable PCI_ENDPOINT_DMAENGINE if your endpoint controller has an implementation >> >> Presence of dma engine could come from epc_features. Or try to get dma channel >> always and use mem_copy if that fails. config option for dmaengine looks >> un-necessary. > This ties back to the previous point of the unmerged dma patch. The > correct way to implement dma depends on that patch. > >>> +config PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION >>> + bool "PCI Virtio Endpoint Function Notification Suppression" >>> + default n >>> + depends on PCI_EPF_VIRTIO >>> + help >>> + Enable this configuration option to allow virtio queues to suppress >>> + some notifications and interrupts. Normally the host and the endpoint >>> + send a notification/interrupt to each other after each packet has been >>> + provided/consumed. Notifications/Interrupts can be generally expensive >>> + across the PCI bus. If this config is enabled, both sides will only >>> + signal the other end after a batch of packets has been consumed/ >>> + provided. However, in reality, this option does not offer significant >>> + performance gain so far. >> >> Would be good to profile and document the bottle-neck so that this could be >> improved upon. > I have a theory for this. The only real "interrupt" is from the > endpoint to host. The "notification" from the host to endpoint is > actually enabled by the endpoint continuously polling for a value in > BAR 0. When the host wants to notify the endpoint, it writes to an > offset in BAR 0 with the index of the virtqueue where an event just > occurs. The endpoint has a dedicated loop that monitors when that value. > Because of this setup, making the host send fewer notifications does not > help because the bottleneck is probably in the expensive polling on the > endpoint. As a consequence, suppressing notification and interrupts does > not seem to offer performance gain. > >>> +/* Default bar sizes */ >>> +static size_t bar_size[] = { 512, 512, 1024, 16384, 131072, 1048576 }; >> >> Only use the BARs actually required by the function. > Will do. > >>> +/* >>> + * Clear mapped memory of a map. If there is memory allocated using the >>> + * pci-ep framework, that memory will be released. >>> + * >>> + * @map: a map struct pointer that will be unmapped >>> + */ >>> +static void pci_epf_unmap(struct pci_epf_map *map) >>> +{ >>> + if (map->iobase) { >> >> how about this instead.. >> if (!map->iobase) >> return; > Sure. > >>> + align = map->align; >>> + iosize = (align > PAGE_SIZE && size < align) ? align : size; >> >> The align parameter should already be configured correctly by epc_features and >> the size should be already handled by pci_epc_mem_alloc_addr(). > This "align" is exactly the same as the align from epc_features. This > line of code actually proved necessary in my development platform. The > epc mem allocator only makes sure the memory allocated is aligned but it > fails to operate on PCI host memory that is not properly aligned. The > endpoint device I developed on had a disastrous 64K page size. When > reading from a physical memory address on the PCI host, the lower 16 > bits of the memory address were all zeroed out. For example, when the > endpoint tried to read the byte at 0x12345 (a phys_addr_t) on the PCI > host, what it actually read was the byte at 0x10000. Because of this, I > had to potentially allocate a much larger space than asked for. If > wanted to access 0x12345, after mapping, map->phys_iobase would be > 0x10000, map->phys_ioaddr would be 0x12345, and a whole 64K memory > region would be allocated. All right. This is for aligning the host address. > >> This looks unnecessary. > See above. > >>> +/* >>> + * Get value from the virtio network config of the local virtio device. >>> + * >>> + * @vdev: local virtio device >>> + * @offset: offset of starting memory address from the start of local >>> + * virtio network config in bytes >>> + * @buf: virtual memory address to store the value >>> + * @len: size of requested data in bytes >>> + */ >>> +static inline void epf_virtio_local_get(struct virtio_device *vdev, >>> + unsigned int offset, >>> + void *buf, >>> + unsigned int len) >>> +{ >>> + memcpy(buf, >>> + (void *)&vdev_to_epf_vdev(vdev)->local_net_cfg + offset, >>> + len); >>> +} >> >> Have all this network specific parts in a separate file. Use the layering >> structure similar to vhost. > Will try to do. > >>> +/* >>> + * Initializes the virtio_pci and virtio_net config space that will be exposed >>> + * to the remote virtio_pci and virtio_net modules on the PCI host. This >>> + * includes setting up feature negotiation and default config setup etc. >>> + * >>> + * @epf_virtio: epf_virtio handler >>> + */ >>> +static void pci_epf_virtio_init_cfg_legacy(struct pci_epf_virtio *epf_virtio) >>> +{ >>> + const u32 dev_feature = >>> + generate_dev_feature32(features, ARRAY_SIZE(features)); >>> + struct virtio_legacy_cfg *const legacy_cfg = epf_virtio->reg[BAR_0]; >> >> virtio_reg_bar instead of BAR_0 > The dilemma was that the virtio_pci on PCI host will only write to BAR > 0. I may need to throw an error if the first free bar is not BAR 0. hmm.. We need a better way to handle it. Just having PCI_VENDOR_ID_REDHAT_QUMRANET in virtio_pci may not be sufficient then. > >>> + pci_epc_stop(epc); >> >> You should never have pci_epc_stop() in function driver as that will break >> multi-function endpoint devices. I'll fix this in pci-epf-test.c. > Look forward to your progress on this. > >>> + for (bar = BAR_0; bar <= BAR_5; bar += add) { >> >> Are you using all these BARs? It's best to allocate and initialize the BARs we use. > Will only use BAR 0 instead. > >> Please add a description for each of these structures. > I had to copy these structures exactly as they were from virtio_ring.c > unfortunately, because they were not exposed via any header file. If > virtio_ring.c has some struct changes, this endpoint function will have > to change accordingly. Some of the structures are exposed in virtio_ring.h. We probably need to use that instead of using the structures from virtio_ring.c. > > Thank you so much for taking time to review this patch. Now that I came > back to university and continued my undergrad study, my kernel > development work will probably slow down a lot. The heavy-lifting work > such as creating more layers to allow more virtio devices will take a > much longer time. Agreed. IMHO we should adapt vhost as a generic backend driver so that it could be used behind PCI. Cheers Kishon