Re: [PATCH] vfio/pci: Virtualize Maximum Read Request Size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex,

On 21/09/2017 21:05, Alex Williamson wrote:
> MRRS defines the maximum read request size a device is allowed to
> make.  Drivers will often increase this to allow more data transfer
> with a single request.  Completions to this request are bound by the
> MPS setting for the bus.  Aside from device quirks (none known), it
> doesn't seem to make sense to set an MRRS value less than MPS, yet
> this is a likely scenario given that user drivers do not have a
> system-wide view of the PCI topology.  Virtualize MRRS such that the
> user can set MRRS >= MPS, but use MPS as the floor value that we'll
> write to hardware.
> 
> Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
Reviewed-by: Eric Auger <eric.auger@xxxxxxxxxx>

Thanks

Eric
> ---
> 
> This patch seems like a bit of a gamble that we'll get more good out
> of fixing up a potentially common inefficient configuration than
> we'll see headache if there exist devices which require lower MRRS
> settings.  However, since the MPS setting should never be higher than
> the MPSS allows, it seems outlandish that a device might require an
> MRRS setting lower than MPSS.
> 
> Also noteworthy, pcie_write_mrrs() has a back-off algorithm which
> expects the device to not accept unsupported MRRS values.  This
> code doesn't actually work because pcie_set_readrq() won't fail as
> a result of this, but it also doesn't match how hardware works.  In
> the case of an Intel X540, I can use setpci on bare metal to write
> whatever I want into MPS and MRRS and it sticks, just like it would
> for the user with these registers virtualized.  If there is hardware
> that would reject values, it won't be reflected to the user unless we
> write back the physical MRRS value to vconfig, which gets shakey
> when we're masking the actual value we're writing in certain cases
> anyway.
> 
>  drivers/vfio/pci/vfio_pci_config.c |   29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 91335e6de88a..115a36f6f403 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -808,6 +808,7 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
>  {
>  	__le16 *ctrl = (__le16 *)(vdev->vconfig + pos -
>  				  offset + PCI_EXP_DEVCTL);
> +	int readrq = le16_to_cpu(*ctrl) & PCI_EXP_DEVCTL_READRQ;
>  
>  	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
>  	if (count < 0)
> @@ -833,6 +834,27 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
>  			pci_try_reset_function(vdev->pdev);
>  	}
>  
> +	/*
> +	 * MPS is virtualized to the user, writes do not change the physical
> +	 * register since determining a proper MPS value requires a system wide
> +	 * device view.  The MRRS is largely independent of MPS, but since the
> +	 * user does not have that system-wide view, they might set a safe, but
> +	 * inefficiently low value.  Here we allow writes through to hardware,
> +	 * but we set the floor to the physical device MPS setting, so that
> +	 * we can at least use full TLPs, as defined by the MPS value.
> +	 *
> +	 * NB, if any devices actually depend on an artificially low MRRS
> +	 * setting, this will need to be revisited, perhaps with a quirk
> +	 * though pcie_set_readrq().
> +	 */
> +	if (readrq != (le16_to_cpu(*ctrl) & PCI_EXP_DEVCTL_READRQ)) {
> +		readrq = 128 <<
> +			((le16_to_cpu(*ctrl) & PCI_EXP_DEVCTL_READRQ) >> 12);
> +		readrq = max(readrq, pcie_get_mps(vdev->pdev));
> +
> +		pcie_set_readrq(vdev->pdev, readrq);
> +	}
> +
>  	return count;
>  }
>  
> @@ -851,11 +873,12 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm)
>  	 * Allow writes to device control fields, except devctl_phantom,
>  	 * which could confuse IOMMU, MPS, which can break communication
>  	 * with other physical devices, and the ARI bit in devctl2, which
> -	 * is set at probe time.  FLR gets virtualized via our writefn.
> +	 * is set at probe time.  FLR and MRRS get virtualized via our
> +	 * writefn.
>  	 */
>  	p_setw(perm, PCI_EXP_DEVCTL,
> -	       PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD,
> -	       ~PCI_EXP_DEVCTL_PHANTOM);
> +	       PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD |
> +	       PCI_EXP_DEVCTL_READRQ, ~PCI_EXP_DEVCTL_PHANTOM);
>  	p_setw(perm, PCI_EXP_DEVCTL2, NO_VIRT, ~PCI_EXP_DEVCTL2_ARI);
>  	return 0;
>  }
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux