> -----Original Message----- > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Tuesday, March 4, 2025 7:24 PM > To: Wathsala Wathawana Vithanage <wathsala.vithanage@xxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; nd > <nd@xxxxxxx>; Kevin Tian <kevin.tian@xxxxxxxxx>; Philipp Stanner > <pstanner@xxxxxxxxxx>; Yunxiang Li <Yunxiang.Li@xxxxxxx>; Dr. David Alan > Gilbert <linux@xxxxxxxxxxx>; Ankit Agrawal <ankita@xxxxxxxxxx>; open list:VFIO > DRIVER <kvm@xxxxxxxxxxxxxxx> > Subject: Re: [RFC PATCH] vfio/pci: add PCIe TPH to device feature ioctl > > On Tue, 4 Mar 2025 22:38:16 +0000 > Wathsala Wathawana Vithanage <wathsala.vithanage@xxxxxxx> wrote: > > > > > Linux v6.13 introduced the PCIe TLP Processing Hints (TPH) feature for > > > > direct cache injection. As described in the relevant patch set [1], > > > > direct cache injection in supported hardware allows optimal platform > > > > resource utilization for specific requests on the PCIe bus. This feature > > > > is currently available only for kernel device drivers. However, > > > > user space applications, especially those whose performance is sensitive > > > > to the latency of inbound writes as seen by a CPU core, may benefit from > > > > using this information (E.g., DPDK cache stashing RFC [2] or an HPC > > > > application running in a VM). > > > > > > > > This patch enables configuring of TPH from the user space via > > > > VFIO_DEVICE_FEATURE IOCLT. It provides an interface to user space > > > > drivers and VMMs to enable/disable the TPH feature on PCIe devices and > > > > set steering tags in MSI-X or steering-tag table entries using > > > > VFIO_DEVICE_FEATURE_SET flag or read steering tags from the kernel using > > > > VFIO_DEVICE_FEATURE_GET to operate in device-specific mode. > > > > > > What level of protection do we expect to have here? Is it OK for > > > userspace to make up any old tag value or is there some security > > > concern with that? > > > > > Shouldn't be allowed from within a container. > > A hypervisor should have its own STs and map them to platform STs for > > the cores the VM is pinned to and verify any old ST is not written to the > > device MSI-X, ST table or device specific locations. > > And how exactly are we mediating device specific steering tags when we > don't know where/how they're written to the device. An API that > returns a valid ST to userspace doesn't provide any guarantees relative > to what userspace later writes. MSI-X tables are also writable by By not enabling TPH in device-specific mode, hypervisors can ensure that setting an ST in a device-specific location (like queue contexts) will have no effect. VMs should also not be allowed to enable TPH. I believe this could be enforced by trapping (causing VM exits) on MSI-X/ST table writes. Having said that, regardless of this proposal or the availability of kernel TPH support, a VFIO driver could enable TPH and set an arbitrary ST on the MSI-X/ST table or a device-specific location on supported platforms. If the driver doesn't have a list of valid STs, it can enumerate 8- or 16-bit STs and measure access latencies to determine valid ones. > userspace. I could have missed it, but I also didn't note any pinning > requirement in this proposal. Thanks, > Sorry, I failed to mention pinning earlier. Let's say we don't pin VMs to CPUs. Now, say VM_A sets an ST on a NIC to get packet data to the L2D of the CPU_N to which its vCPU_0 is currently bound. Then, after a while, say, VM_B gets scheduled to CPU_N. CPU_N, regardless of what process/thread is scheduled, will continuously receive data from VM A's NIC for its L2D. Consequently, the performance of VMs scheduled on CPU_N other than VM_A would degrade due to capacity misses and invalidations. This is where the pinning requirement comes from. --wathsala