On Sun, Jan 14, 2018 at 07:45:50AM +0000, Ilya Lesokhin wrote: > Hi, > I have a concern about the portability of offloading the new VIRTIO packed ring format to hardware. > > According to the PCIe rev 2.0, paragraph 2.4.2. Update Ordering and Granularity Observed by a Read Transaction" > " if a host CPU writes a QWORD to host memory, a Requester reading that QWORD from host memory may observe a portion of the QWORD updated and another portion of it containing the old value." > > This means that after the device reads a 16byte descriptor, it cannot know that all the values In the descriptor are up to date even if the VIRTQ_DESC_F_AVAIL bit is set. > This is true even if the driver uses the appropriate memory barriers. > > We encountered this behavior in practice on x86 servers. Our solution was to add an index to the latest valid descriptor > > Note that in practice the update granularity in x86 seems to be a cacheline, But this is not guaranteed by the spec. > The spec only makes the following recommendation: > "While not required by this specification, it is strongly recommended that host platforms guarantee that when a host CPU writes aligned DWORDs or aligned QWORDs to host memory, the update granularity observed by a PCI Express read will not be smaller than a DWORD." > > Thanks, > Ilya This is a very good point. This consideration is one of the reasons I included last valid descriptor in the driver notification. My guess would be that such hardware should never use driver event suppression. As a result, driver will always send notifications after each batch of descriptors. Device can use that to figure out which descriptors to fetch. Luckily, with pass-through device memory can be mapped directly into the VM, so no the notification will not trigger a VM exit. It would be interesting to find out whether specific host systems give a stronger guarantee than what is required by the PCIE spec. If so we could add e.g. a feature bit to let the device know it's safe to read beyond the index supplied in the kick notification. Drivers would detect this and use it to reduce the overhead. Conversely, this is also why I selected: #define VIRTQ_DESC_F_USED 15 this way we don't have the same issue in the reverse order: the last byte is used to mark buffer as used, which actually seems to be guaranteed to happen the last from software point of view in a portable. -- MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization