Sorry for directly looping Andrew in. I think Andrew may be familiar with the code in question. Some backgrounds: Mis-aligned addresses from ACPI table can pass along pci_remap_iospace() -> vmap_page_range() -> vmap_pte_range() path, leading to a loop overrun in vmap_pte_range(). Bjorn and I wonder why all those vmap_*_range() functions don't validate the alignment, assuming the addresses page-aligned. We want to know the best place to do this check. > 2024年8月15日 12:28,Miao Wang <shankerwangmiao@xxxxxxxxx> 写道: > > Hi, > >> 2024年8月15日 00:37,Bjorn Helgaas <helgaas@xxxxxxxxxx> 写道: >> >> [+cc linux-mm for vmap page alignment checking question] >> >> On Wed, Aug 14, 2024 at 08:09:15PM +0800, Miao Wang via B4 Relay wrote: >>> From: Miao Wang <shankerwangmiao@xxxxxxxxx> >>> >>> When the IO resource given by _CRS method is not page aligned, especially >>> when the page size is larger than 4KB, serious problems will happen >>> because the misaligned address is passed down to pci_remap_iospace(), >>> then to vmap_page_range(), and finally to vmap_pte_range(), where the >>> length between addr and end is expected to be divisible by PAGE_SIZE, or >>> the loop will overrun till the pfn_none check fails. >> >> What does this problem look like to a user? Panic, oops, hang, >> warning backtrace? I assume this is not a regression, but maybe >> something you tripped over because of a BIOS defect? Does this need >> to be backported to stable kernels? > > Panic, or actually BUG in vmap_pte_range() at the !pte_none(ptep_get(pte)) > check, since misaligned addresses will cause the loop in vmap_pte_range > overrun and finally reach one of the already mapped pages. This happens on > a LS2k2000 machine, the buggy firmware of which declares the IO space of > the PCI root controller as follows: > > QWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange, > 0x0000000000000000, // Granularity > 0x0000000000004000, // Range Minimum > 0x0000000000009FFF, // Range Maximum > 0x000000FDFC000000, // Translation Offset > 0x0000000000006000, // Length > ,, , TypeStatic, DenseTranslation) > > At first, I thought there might be some overlapping address spaces. But when > I added some debug output in vmap_page_range(), I realized that it was > because a loop overrun. > > Normally, loongarch64 kernel is compiled using 16K page size, and thus the > length here is not page aligned. I tested my patch using a virtual machine > with a deliberately modified DSDT table to reproduce this issue. > >> It seems sort of weird to me that all those vmap_*_range() functions >> take the full page address (not a PFN) and depend on the addr/size >> being page-aligned, but they don't validate the alignment. But I'm >> not a VM person and I suppose there's a reason for passing the full >> address. > Ah, I also have this question. >> >> But it does mean that other users of vmap_page_range() are also >> potentially susceptible to this issue, e.g., vmap(), vm_map_ram(), >> ioremap_page_range(), etc., so I'm not sure that >> acpi_pci_root_remap_iospace() is the best place to check the >> alignment. > My first idea was that the misaligned address is introduced from DSDT > table and the check would be better to be done inside the ACPI system. > However, lets wait for replies from linux-mm to decide where should be > the best place. >> >>> Signed-off-by: Miao Wang <shankerwangmiao@xxxxxxxxx> >>> --- >>> Changes in v3: >>> - Adjust code formatting. >>> - Reword the commit message for further description of the possible reason >>> leading to misaligned IO resource addresses. >>> - Link to v2: https://lore.kernel.org/r/20240814-check_pci_probe_res-v2-1-a03c8c9b498b@xxxxxxxxx >>> >>> Changes in v2: >>> - Sorry for posting out the draft version in V1, fixed a silly compiling issue. >>> - Link to v1: https://lore.kernel.org/r/20240814-check_pci_probe_res-v1-1-122ee07821ab@xxxxxxxxx >>> --- >>> drivers/acpi/pci_root.c | 14 +++++++++++--- >>> 1 file changed, 11 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c >>> index d0bfb3706801..a425e93024f2 100644 >>> --- a/drivers/acpi/pci_root.c >>> +++ b/drivers/acpi/pci_root.c >>> @@ -858,7 +858,7 @@ static void acpi_pci_root_validate_resources(struct device *dev, >>> } >>> } >>> >>> -static void acpi_pci_root_remap_iospace(struct fwnode_handle *fwnode, >>> +static void acpi_pci_root_remap_iospace(struct acpi_device *device, >>> struct resource_entry *entry) >>> { >>> #ifdef PCI_IOBASE >>> @@ -868,7 +868,15 @@ static void acpi_pci_root_remap_iospace(struct fwnode_handle *fwnode, >>> resource_size_t length = resource_size(res); >>> unsigned long port; >>> >>> - if (pci_register_io_range(fwnode, cpu_addr, length)) >>> + if (!PAGE_ALIGNED(cpu_addr) || !PAGE_ALIGNED(length) || >>> + !PAGE_ALIGNED(pci_addr)) { >>> + dev_err(&device->dev, >>> + FW_BUG "I/O resource %pR or its offset %pa is not page aligned\n", >>> + res, &entry->offset); >>> + goto err; >>> + } >>> + >>> + if (pci_register_io_range(&device->fwnode, cpu_addr, length)) >>> goto err; >> >> This change verifies alignment for the ACPI case that leads to the >> pci_remap_iospace() -> vmap_page_range() -> vmap_pte_range() path, but >> there are others even in drivers/pci/, e.g., pci_remap_iospace() is >> also used in the DT path, where I suppose a defective DT could cause a >> similar issue. >> >>> port = pci_address_to_pio(cpu_addr); >>> @@ -910,7 +918,7 @@ int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info) >>> else { >>> resource_list_for_each_entry_safe(entry, tmp, list) { >>> if (entry->res->flags & IORESOURCE_IO) >>> - acpi_pci_root_remap_iospace(&device->fwnode, >>> + acpi_pci_root_remap_iospace(device, >>> entry); >>> >>> if (entry->res->flags & IORESOURCE_DISABLED) >>> >>> --- >>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba >>> change-id: 20240813-check_pci_probe_res-27e3e6df72b2 >>> >>> Best regards, >>> -- >>> Miao Wang <shankerwangmiao@xxxxxxxxx>