Re: [PATCH v2] vfio/pci: Properly hide first-in-list PCIe extended capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 25 Nov 2024 10:31:38 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:

> On 2024/11/23 00:48, Alex Williamson wrote:
> > On Fri, 22 Nov 2024 20:45:08 +0800
> > Yi Liu <yi.l.liu@xxxxxxxxx> wrote:
> >   
> >> On 2024/11/21 22:00, Avihai Horon wrote:  
> >>> There are cases where a PCIe extended capability should be hidden from
> >>> the user. For example, an unknown capability (i.e., capability with ID
> >>> greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
> >>> chosen to be hidden from the user.
> >>>
> >>> Hiding a capability is done by virtualizing and modifying the 'Next
> >>> Capability Offset' field of the previous capability so it points to the
> >>> capability after the one that should be hidden.
> >>>
> >>> The special case where the first capability in the list should be hidden
> >>> is handled differently because there is no previous capability that can
> >>> be modified. In this case, the capability ID and version are zeroed
> >>> while leaving the next pointer intact. This hides the capability and
> >>> leaves an anchor for the rest of the capability list.
> >>>
> >>> However, today, hiding the first capability in the list is not done
> >>> properly if the capability is unknown, as struct
> >>> vfio_pci_core_device->pci_config_map is set to the capability ID during
> >>> initialization but the capability ID is not properly checked later when
> >>> used in vfio_config_do_rw(). This leads to the following warning [1] and
> >>> to an out-of-bounds access to ecap_perms array.
> >>>
> >>> Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater
> >>> than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct
> >>> read only access instead of the ecap_perms array.
> >>>
> >>> Note that this is safe since the above is the only case where cap_id can
> >>> exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which
> >>> are already checked before).
> >>>
> >>> [1]
> >>>
> >>> WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]  
> >>
> >> strange, it is not in the vfio_config_do_rw(). But never mind.
> >>  
> >>> CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1
> >>> (snip)
> >>> Call Trace:
> >>>    <TASK>
> >>>    ? show_regs+0x69/0x80
> >>>    ? __warn+0x8d/0x140
> >>>    ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
> >>>    ? report_bug+0x18f/0x1a0
> >>>    ? handle_bug+0x63/0xa0
> >>>    ? exc_invalid_op+0x19/0x70
> >>>    ? asm_exc_invalid_op+0x1b/0x20
> >>>    ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
> >>>    ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core]
> >>>    vfio_pci_rw+0x101/0x1b0 [vfio_pci_core]
> >>>    vfio_pci_core_read+0x1d/0x30 [vfio_pci_core]
> >>>    vfio_device_fops_read+0x27/0x40 [vfio]
> >>>    vfs_read+0xbd/0x340
> >>>    ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio]
> >>>    ? __rseq_handle_notify_resume+0xa4/0x4b0
> >>>    __x64_sys_pread64+0x96/0xc0
> >>>    x64_sys_call+0x1c3d/0x20d0
> >>>    do_syscall_64+0x4d/0x120
> >>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>>
> >>> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
> >>> Signed-off-by: Avihai Horon <avihaih@xxxxxxxxxx>
> >>> ---
> >>> Changes from v1:
> >>> * Use Alex's suggestion to fix the bug and adapt the commit message.
> >>> ---
> >>>    drivers/vfio/pci/vfio_pci_config.c | 20 ++++++++++++++++----
> >>>    1 file changed, 16 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> >>> index 97422aafaa7b..b2a1ba66e5f1 100644
> >>> --- a/drivers/vfio/pci/vfio_pci_config.c
> >>> +++ b/drivers/vfio/pci/vfio_pci_config.c
> >>> @@ -313,12 +313,16 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
> >>>    	return count;
> >>>    }
> >>>    
> >>> +static const struct perm_bits direct_ro_perms = {
> >>> +	.readfn = vfio_direct_config_read,
> >>> +};
> >>> +
> >>>    /* Default capability regions to read-only, no-virtualization */
> >>>    static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
> >>> -	[0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
> >>> +	[0 ... PCI_CAP_ID_MAX] = direct_ro_perms
> >>>    };
> >>>    static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
> >>> -	[0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
> >>> +	[0 ... PCI_EXT_CAP_ID_MAX] = direct_ro_perms
> >>>    };
> >>>    /*
> >>>     * Default unassigned regions to raw read-write access.  Some devices
> >>> @@ -1897,9 +1901,17 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
> >>>    		cap_start = *ppos;
> >>>    	} else {
> >>>    		if (*ppos >= PCI_CFG_SPACE_SIZE) {
> >>> -			WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
> >>> +			/*
> >>> +			 * We can get a cap_id that exceeds PCI_EXT_CAP_ID_MAX
> >>> +			 * if we're hiding an unknown capability at the start
> >>> +			 * of the extended capability list.  Use default, ro
> >>> +			 * access, which will virtualize the id and next values.
> >>> +			 */
> >>> +			if (cap_id > PCI_EXT_CAP_ID_MAX)
> >>> +				perm = (struct perm_bits *)&direct_ro_perms;
> >>> +			else
> >>> +				perm = &ecap_perms[cap_id];
> >>>    
> >>> -			perm = &ecap_perms[cap_id];
> >>>    			cap_start = vfio_find_cap_start(vdev, *ppos);
> >>>    		} else {
> >>>    			WARN_ON(cap_id > PCI_CAP_ID_MAX);  
> >>
> >> Looks good to me. :) I'm able to trigger this warning by hide the first
> >> ecap on my system with the below hack.
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci_config.c
> >> b/drivers/vfio/pci/vfio_pci_config.c
> >> index b2a1ba66e5f1..db91e19a48b3 100644
> >> --- a/drivers/vfio/pci/vfio_pci_config.c
> >> +++ b/drivers/vfio/pci/vfio_pci_config.c
> >> @@ -1617,6 +1617,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device
> >> *vdev)
> >>    	u16 epos;
> >>    	__le32 *prev = NULL;
> >>    	int loops, ret, ecaps = 0;
> >> +	int iii =0;
> >>
> >>    	if (!vdev->extended_caps)
> >>    		return 0;
> >> @@ -1635,7 +1636,11 @@ static int vfio_ecap_init(struct
> >> vfio_pci_core_device *vdev)
> >>    		if (ret)
> >>    			return ret;
> >>
> >> -		ecap = PCI_EXT_CAP_ID(header);
> >> +		if (iii == 0) {
> >> +			ecap = 0x61;
> >> +			iii++;
> >> +		} else
> >> +			ecap = PCI_EXT_CAP_ID(header);
> >>
> >>    		if (ecap <= PCI_EXT_CAP_ID_MAX) {
> >>    			len = pci_ext_cap_length[ecap];
> >> @@ -1664,6 +1669,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device
> >> *vdev)
> >>    			 */
> >>    			len = PCI_CAP_SIZEOF;
> >>    			hidden = true;
> >> +			printk("%s set hide\n", __func__);
> >>    		}
> >>
> >>    		for (i = 0; i < len; i++) {
> >> @@ -1893,6 +1899,7 @@ static ssize_t vfio_config_do_rw(struct
> >> vfio_pci_core_device *vdev, char __user
> >>
> >>    	cap_id = vdev->pci_config_map[*ppos];
> >>
> >> +	printk("%s cap_id: %x\n", __func__, cap_id);
> >>    	if (cap_id == PCI_CAP_ID_INVALID) {
> >>    		perm = &unassigned_perms;
> >>    		cap_start = *ppos;
> >>
> >> And then this warning is gone after applying this patch. Hence,
> >>
> >> Reviewed-by: Yi Liu <yi.l.liu@xxxxxxxxx>
> >> Tested-by: Yi Liu <yi.l.liu@xxxxxxxxx>  
> > 
> > Thanks, good testing!
> >     
> >> But I can still see a valid next pointer. Like the below log, I hide
> >> the first ecap at offset 0x100, its ID is zeroed. The second ecap locates
> >> at offset==0x150, its cap_id is 0x0018. I can see the next pointer in the
> >> guest. Is it expected?  
> > 
> > This is what makes hiding the first ecap unique, the ecap chain always
> > starts at 0x100, the next pointer must be valid for the rest of the
> > chain to remain.  For standard capabilities we can change the register
> > pointing at the head of the list.  This therefore looks like expected
> > behavior, unless I'm missing something more subtle in your example.  
> 
> Got you. I was a little bit misled by the below comment. I thought the
> cap_id, version and next would be zeroed. But the code actually only zeros
> the cap_id and version. :)
> 
> 		/*
> 		 * If we're just using this capability to anchor the list,
> 		 * hide the real ID.  Only count real ecaps.  XXX PCI spec
> 		 * indicates to use cap id = 0, version = 0, next = 0 if
> 		 * ecaps are absent, hope users check all the way to next.
> 		 */

This is just expressing concern that a sloppy guest might decide the
list ends at the zero capability ID w/o strictly following the spec.
We've never seen evidence of such a guest.  Thanks,

Alex





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux