On 2024/11/21 22:00, Avihai Horon wrote:
There are cases where a PCIe extended capability should be hidden from
the user. For example, an unknown capability (i.e., capability with ID
greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
chosen to be hidden from the user.
Hiding a capability is done by virtualizing and modifying the 'Next
Capability Offset' field of the previous capability so it points to the
capability after the one that should be hidden.
The special case where the first capability in the list should be hidden
is handled differently because there is no previous capability that can
be modified. In this case, the capability ID and version are zeroed
while leaving the next pointer intact. This hides the capability and
leaves an anchor for the rest of the capability list.
However, today, hiding the first capability in the list is not done
properly if the capability is unknown, as struct
vfio_pci_core_device->pci_config_map is set to the capability ID during
initialization but the capability ID is not properly checked later when
used in vfio_config_do_rw(). This leads to the following warning [1] and
to an out-of-bounds access to ecap_perms array.
Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater
than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct
read only access instead of the ecap_perms array.
Note that this is safe since the above is the only case where cap_id can
exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which
are already checked before).
[1]
WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
strange, it is not in the vfio_config_do_rw(). But never mind.
CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1
(snip)
Call Trace:
<TASK>
? show_regs+0x69/0x80
? __warn+0x8d/0x140
? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
? report_bug+0x18f/0x1a0
? handle_bug+0x63/0xa0
? exc_invalid_op+0x19/0x70
? asm_exc_invalid_op+0x1b/0x20
? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core]
vfio_pci_rw+0x101/0x1b0 [vfio_pci_core]
vfio_pci_core_read+0x1d/0x30 [vfio_pci_core]
vfio_device_fops_read+0x27/0x40 [vfio]
vfs_read+0xbd/0x340
? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio]
? __rseq_handle_notify_resume+0xa4/0x4b0
__x64_sys_pread64+0x96/0xc0
x64_sys_call+0x1c3d/0x20d0
do_syscall_64+0x4d/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Avihai Horon <avihaih@xxxxxxxxxx>
---
Changes from v1:
* Use Alex's suggestion to fix the bug and adapt the commit message.
---
drivers/vfio/pci/vfio_pci_config.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 97422aafaa7b..b2a1ba66e5f1 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -313,12 +313,16 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
return count;
}
+static const struct perm_bits direct_ro_perms = {
+ .readfn = vfio_direct_config_read,
+};
+
/* Default capability regions to read-only, no-virtualization */
static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
- [0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
+ [0 ... PCI_CAP_ID_MAX] = direct_ro_perms
};
static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
- [0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
+ [0 ... PCI_EXT_CAP_ID_MAX] = direct_ro_perms
};
/*
* Default unassigned regions to raw read-write access. Some devices
@@ -1897,9 +1901,17 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
cap_start = *ppos;
} else {
if (*ppos >= PCI_CFG_SPACE_SIZE) {
- WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
+ /*
+ * We can get a cap_id that exceeds PCI_EXT_CAP_ID_MAX
+ * if we're hiding an unknown capability at the start
+ * of the extended capability list. Use default, ro
+ * access, which will virtualize the id and next values.
+ */
+ if (cap_id > PCI_EXT_CAP_ID_MAX)
+ perm = (struct perm_bits *)&direct_ro_perms;
+ else
+ perm = &ecap_perms[cap_id];
- perm = &ecap_perms[cap_id];
cap_start = vfio_find_cap_start(vdev, *ppos);
} else {
WARN_ON(cap_id > PCI_CAP_ID_MAX);
Looks good to me. :) I'm able to trigger this warning by hide the first
ecap on my system with the below hack.
diff --git a/drivers/vfio/pci/vfio_pci_config.c
b/drivers/vfio/pci/vfio_pci_config.c
index b2a1ba66e5f1..db91e19a48b3 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1617,6 +1617,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device
*vdev)
u16 epos;
__le32 *prev = NULL;
int loops, ret, ecaps = 0;
+ int iii =0;
if (!vdev->extended_caps)
return 0;
@@ -1635,7 +1636,11 @@ static int vfio_ecap_init(struct
vfio_pci_core_device *vdev)
if (ret)
return ret;
- ecap = PCI_EXT_CAP_ID(header);
+ if (iii == 0) {
+ ecap = 0x61;
+ iii++;
+ } else
+ ecap = PCI_EXT_CAP_ID(header);
if (ecap <= PCI_EXT_CAP_ID_MAX) {
len = pci_ext_cap_length[ecap];
@@ -1664,6 +1669,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device
*vdev)
*/
len = PCI_CAP_SIZEOF;
hidden = true;
+ printk("%s set hide\n", __func__);
}
for (i = 0; i < len; i++) {
@@ -1893,6 +1899,7 @@ static ssize_t vfio_config_do_rw(struct
vfio_pci_core_device *vdev, char __user
cap_id = vdev->pci_config_map[*ppos];
+ printk("%s cap_id: %x\n", __func__, cap_id);
if (cap_id == PCI_CAP_ID_INVALID) {
perm = &unassigned_perms;
cap_start = *ppos;
And then this warning is gone after applying this patch. Hence,
Reviewed-by: Yi Liu <yi.l.liu@xxxxxxxxx>
Tested-by: Yi Liu <yi.l.liu@xxxxxxxxx>
But I can still see a valid next pointer. Like the below log, I hide
the first ecap at offset 0x100, its ID is zeroed. The second ecap locates
at offset==0x150, its cap_id is 0x0018. I can see the next pointer in the
guest. Is it expected?
Guest:
100: 00 00 00 15 00 00 00 00 00 00 10 00 00 00 04 00
110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: 18 00 01 16 00 00 00 00 00 00 00 00 00 00 00 00
160: 17 00 01 17 05 02 01 00 00 00 00 00 00 00 00 00
Host:
100: 01 00 02 15 00 00 00 00 00 00 10 00 00 00 04 00
110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: 18 00 01 16 00 00 00 00 00 00 00 00 00 00 00 00
160: 17 00 01 17 05 02 01 00 00 00 00 00 00 00 00 00
BTW. If the first PCI cap is a unknown cap, will it have a problem? The
vfio_pci_core_device->pci_config_map is kept to be PCI_CAP_ID_INVALID,
hence it would use the unassigned_perms. But it makes more sense to use the
direct_ro_perms introduced here. is it?
--
Regards,
Yi Liu