On Mon, 25 Nov 2024 10:31:38 +0800 Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > On 2024/11/23 00:48, Alex Williamson wrote: > > On Fri, 22 Nov 2024 20:45:08 +0800 > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > >> On 2024/11/21 22:00, Avihai Horon wrote: > >>> There are cases where a PCIe extended capability should be hidden from > >>> the user. For example, an unknown capability (i.e., capability with ID > >>> greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally > >>> chosen to be hidden from the user. > >>> > >>> Hiding a capability is done by virtualizing and modifying the 'Next > >>> Capability Offset' field of the previous capability so it points to the > >>> capability after the one that should be hidden. > >>> > >>> The special case where the first capability in the list should be hidden > >>> is handled differently because there is no previous capability that can > >>> be modified. In this case, the capability ID and version are zeroed > >>> while leaving the next pointer intact. This hides the capability and > >>> leaves an anchor for the rest of the capability list. > >>> > >>> However, today, hiding the first capability in the list is not done > >>> properly if the capability is unknown, as struct > >>> vfio_pci_core_device->pci_config_map is set to the capability ID during > >>> initialization but the capability ID is not properly checked later when > >>> used in vfio_config_do_rw(). This leads to the following warning [1] and > >>> to an out-of-bounds access to ecap_perms array. > >>> > >>> Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater > >>> than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct > >>> read only access instead of the ecap_perms array. > >>> > >>> Note that this is safe since the above is the only case where cap_id can > >>> exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which > >>> are already checked before). > >>> > >>> [1] > >>> > >>> WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] > >> > >> strange, it is not in the vfio_config_do_rw(). But never mind. > >> > >>> CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1 > >>> (snip) > >>> Call Trace: > >>> <TASK> > >>> ? show_regs+0x69/0x80 > >>> ? __warn+0x8d/0x140 > >>> ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] > >>> ? report_bug+0x18f/0x1a0 > >>> ? handle_bug+0x63/0xa0 > >>> ? exc_invalid_op+0x19/0x70 > >>> ? asm_exc_invalid_op+0x1b/0x20 > >>> ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] > >>> ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core] > >>> vfio_pci_rw+0x101/0x1b0 [vfio_pci_core] > >>> vfio_pci_core_read+0x1d/0x30 [vfio_pci_core] > >>> vfio_device_fops_read+0x27/0x40 [vfio] > >>> vfs_read+0xbd/0x340 > >>> ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio] > >>> ? __rseq_handle_notify_resume+0xa4/0x4b0 > >>> __x64_sys_pread64+0x96/0xc0 > >>> x64_sys_call+0x1c3d/0x20d0 > >>> do_syscall_64+0x4d/0x120 > >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e > >>> > >>> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") > >>> Signed-off-by: Avihai Horon <avihaih@xxxxxxxxxx> > >>> --- > >>> Changes from v1: > >>> * Use Alex's suggestion to fix the bug and adapt the commit message. > >>> --- > >>> drivers/vfio/pci/vfio_pci_config.c | 20 ++++++++++++++++---- > >>> 1 file changed, 16 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > >>> index 97422aafaa7b..b2a1ba66e5f1 100644 > >>> --- a/drivers/vfio/pci/vfio_pci_config.c > >>> +++ b/drivers/vfio/pci/vfio_pci_config.c > >>> @@ -313,12 +313,16 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos, > >>> return count; > >>> } > >>> > >>> +static const struct perm_bits direct_ro_perms = { > >>> + .readfn = vfio_direct_config_read, > >>> +}; > >>> + > >>> /* Default capability regions to read-only, no-virtualization */ > >>> static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = { > >>> - [0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read } > >>> + [0 ... PCI_CAP_ID_MAX] = direct_ro_perms > >>> }; > >>> static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = { > >>> - [0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read } > >>> + [0 ... PCI_EXT_CAP_ID_MAX] = direct_ro_perms > >>> }; > >>> /* > >>> * Default unassigned regions to raw read-write access. Some devices > >>> @@ -1897,9 +1901,17 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user > >>> cap_start = *ppos; > >>> } else { > >>> if (*ppos >= PCI_CFG_SPACE_SIZE) { > >>> - WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX); > >>> + /* > >>> + * We can get a cap_id that exceeds PCI_EXT_CAP_ID_MAX > >>> + * if we're hiding an unknown capability at the start > >>> + * of the extended capability list. Use default, ro > >>> + * access, which will virtualize the id and next values. > >>> + */ > >>> + if (cap_id > PCI_EXT_CAP_ID_MAX) > >>> + perm = (struct perm_bits *)&direct_ro_perms; > >>> + else > >>> + perm = &ecap_perms[cap_id]; > >>> > >>> - perm = &ecap_perms[cap_id]; > >>> cap_start = vfio_find_cap_start(vdev, *ppos); > >>> } else { > >>> WARN_ON(cap_id > PCI_CAP_ID_MAX); > >> > >> Looks good to me. :) I'm able to trigger this warning by hide the first > >> ecap on my system with the below hack. > >> > >> diff --git a/drivers/vfio/pci/vfio_pci_config.c > >> b/drivers/vfio/pci/vfio_pci_config.c > >> index b2a1ba66e5f1..db91e19a48b3 100644 > >> --- a/drivers/vfio/pci/vfio_pci_config.c > >> +++ b/drivers/vfio/pci/vfio_pci_config.c > >> @@ -1617,6 +1617,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device > >> *vdev) > >> u16 epos; > >> __le32 *prev = NULL; > >> int loops, ret, ecaps = 0; > >> + int iii =0; > >> > >> if (!vdev->extended_caps) > >> return 0; > >> @@ -1635,7 +1636,11 @@ static int vfio_ecap_init(struct > >> vfio_pci_core_device *vdev) > >> if (ret) > >> return ret; > >> > >> - ecap = PCI_EXT_CAP_ID(header); > >> + if (iii == 0) { > >> + ecap = 0x61; > >> + iii++; > >> + } else > >> + ecap = PCI_EXT_CAP_ID(header); > >> > >> if (ecap <= PCI_EXT_CAP_ID_MAX) { > >> len = pci_ext_cap_length[ecap]; > >> @@ -1664,6 +1669,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device > >> *vdev) > >> */ > >> len = PCI_CAP_SIZEOF; > >> hidden = true; > >> + printk("%s set hide\n", __func__); > >> } > >> > >> for (i = 0; i < len; i++) { > >> @@ -1893,6 +1899,7 @@ static ssize_t vfio_config_do_rw(struct > >> vfio_pci_core_device *vdev, char __user > >> > >> cap_id = vdev->pci_config_map[*ppos]; > >> > >> + printk("%s cap_id: %x\n", __func__, cap_id); > >> if (cap_id == PCI_CAP_ID_INVALID) { > >> perm = &unassigned_perms; > >> cap_start = *ppos; > >> > >> And then this warning is gone after applying this patch. Hence, > >> > >> Reviewed-by: Yi Liu <yi.l.liu@xxxxxxxxx> > >> Tested-by: Yi Liu <yi.l.liu@xxxxxxxxx> > > > > Thanks, good testing! > > > >> But I can still see a valid next pointer. Like the below log, I hide > >> the first ecap at offset 0x100, its ID is zeroed. The second ecap locates > >> at offset==0x150, its cap_id is 0x0018. I can see the next pointer in the > >> guest. Is it expected? > > > > This is what makes hiding the first ecap unique, the ecap chain always > > starts at 0x100, the next pointer must be valid for the rest of the > > chain to remain. For standard capabilities we can change the register > > pointing at the head of the list. This therefore looks like expected > > behavior, unless I'm missing something more subtle in your example. > > Got you. I was a little bit misled by the below comment. I thought the > cap_id, version and next would be zeroed. But the code actually only zeros > the cap_id and version. :) > > /* > * If we're just using this capability to anchor the list, > * hide the real ID. Only count real ecaps. XXX PCI spec > * indicates to use cap id = 0, version = 0, next = 0 if > * ecaps are absent, hope users check all the way to next. > */ This is just expressing concern that a sloppy guest might decide the list ends at the zero capability ID w/o strictly following the spec. We've never seen evidence of such a guest. Thanks, Alex