Re: PCIe ordering and new VIRTIO packed ring format.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 14, 2018 at 07:45:50AM +0000, Ilya Lesokhin wrote:
> Hi,
> I have a concern about the portability of offloading the new VIRTIO packed ring format to hardware.
> 
> According to the PCIe rev 2.0, paragraph 2.4.2. Update Ordering and Granularity Observed by a Read Transaction"
> " if a host CPU writes a QWORD to host memory, a Requester reading that QWORD from host memory may observe a portion of the QWORD updated and another portion of it containing the old value."
> 
> This means that after the device reads a 16byte descriptor, it cannot know that all the values In the descriptor are up to date even if the VIRTQ_DESC_F_AVAIL bit is set.
> This is true even if the driver uses the appropriate memory barriers.
> 
> We encountered this behavior in practice on x86 servers. Our solution was to add an index to the latest valid descriptor
> 
> Note that in practice the update granularity in x86 seems to be a cacheline, But this is not guaranteed by the spec. 
> The spec only makes the following recommendation:
> "While not required by this specification, it is strongly recommended that host platforms guarantee that when a host CPU writes aligned DWORDs or aligned QWORDs to host memory, the update granularity observed by a PCI Express read will not be smaller than a DWORD."
> 
> Thanks,
> Ilya

This is a very good point.  This consideration is one of the reasons I
included last valid descriptor in the driver notification.  My guess
would be that such hardware should never use driver event suppression.
As a result, driver will always send notifications after each batch of
descriptors. Device can use that to figure out which descriptors to
fetch. Luckily, with pass-through device memory can be mapped
directly into the VM, so no the notification will not trigger
a VM exit.

It would be interesting to find out whether specific host systems
give a stronger guarantee than what is required by the PCIE spec.
If so we could add e.g. a feature bit to let the device
know it's safe to read beyond the index supplied in the kick
notification. Drivers would detect this and use it to reduce
the overhead.

Conversely, this is also why I selected:
#define VIRTQ_DESC_F_USED      15
this way we don't have the same issue in the reverse order:
the last byte is used to mark buffer as used,
which actually seems to be guaranteed to happen the last from
software point of view in a portable.

-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux