On Mon, Sep 20, 2021 at 6:44 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > Details for build failures below. Several improvements since last week, > but it looks like the alpha related pci_iounmap patches still need some > tweaking (see error log at the very end). Bah. I thought I had tested sparc64, but I was wrong. Silly me, I had only tested the 32-bit case. That sparc64 thing is being particularly stupid: sparc64 uses GENERIC_PCI_IOMAP, but declares its own empty pci_iounmap() because it didn't use GENERIC_IOMAP that does this all right. And because it didn't use the generic iomap code, it's all actually entirely buggy, in that it seems to think that pci_iounmap() is about unmapping ports (like ioport_unmap) and thus a no-op. But no, pci_iounmap() is supposed to unmap anything that pci_iomap() mapped, which includes the actual MMIO range too. Basically, the whole idea of "pci_iomap()" is that you give it a PCI device and the index to the BAR in that device, and it maps that BAR - whether it is MMIO or PIO. And then you can use that __iomem pointer for ioread*() and friends (or you can use readl()/writel() if you know it was MMIO). You can give it a maximum length if you want, but by default it just maps the whole PCI BAR, so the default usage would just be void __iomem *map = pci_iomap(pdev, bar, 0); And then you do whatever IO using that 'map' base pointer, and once you're done you do "pci_iounmap()" on it all. And then the trick most cases use is that they know that the PIO case is just always a fixed map, so for PIO that "map/unmap" part os a no-op. But generally ONLY for the PIO case. And the sparc64 code seems to think it's only used for PIO, and makes pci_iounmap() a no-op in general. Which is all kinds of completely broken. This is the same bug that the broken inline function in <asm-generic/io.h> had, and that I added a big comment about in commit 316e8d79a095 ("pci_iounmap'2: Electric Boogaloo: try to make sense of it all"): + * This code is odd, and the ARCH_HAS/ARCH_WANTS #define logic comes + * from legacy <asm-generic/io.h> header file behavior. In particular, + * it would seem to make sense to do the iounmap(p) for the non-IO-space + * case here regardless, but that's not what the old header file code + * did. Probably incorrectly, but this is meant to be bug-for-bug + * compatible. but I intentionally didn't fix the bug in that commit, because I wanted to just try to keep the odd old logic as closely as possible. It looks like a big part of the "people do their own pci_iounmap()" thing is that they do it badly and with bugs. This was all meant to uncover and fix warnings, but it seems to be uncovering bigger issues. Of course, most of the time the "pci_iounmap()" only happens at driver unload time, so it's basically only a kernel virtual memory mapping leak, which may be why people didn't realize how buggy their own implementations were. What the normal GENERIC_IOMAP code does is: - it "fake maps" the PIO space at an invalid fixed virtual address Since we know that a PIO address on PCI is just a 16-bit number, this fake virtual window is small and easy to do: /* * We encode the physical PIO addresses (0-0xffff) into the * pointer by offsetting them with a constant (0x10000) and * assuming that all the low addresses are always PIO. That means * we can do some sanity checks on the low bits, and don't * need to just take things for granted. */ #define PIO_OFFSET 0x10000UL #define PIO_MASK 0x0ffffUL #define PIO_RESERVED 0x40000UL so the logic is basically that we can trivially test whether a "void __iomem *" pointer is a PIO pointer or not: if the pointer value is in that range of PIO_OFFSET..PIO_OFFSET+PIO_MASK range, it's PIO, otherwise it's mmio. - the MMIO space acts using all the normal ioremap() logic, and we can tell the end result apart from PIO with the above trivial thing. - the GENERIC_IOMAP code internally just has a IO_COND(adds, is_pio, is_mmio) helper macro, which sets "port" for the is_pio case, and "addr" for the is_mmio case, so you can do trivial things like this: unsigned int ioread8(const void __iomem *addr) { IO_COND(addr, return inb(port), return readb(addr)); return 0xff; } which does the "inb(port)" for the PIO case, and the "readb(addr)" for the MMIO case. - and lookie here what the GENERIC_IOMAP code for pci_iounmap() is: void pci_iounmap(struct pci_dev *dev, void __iomem * addr) { IO_COND(addr, /* nothing */, iounmap(addr)); } IOW, for the "is_pio" case it does nothing, and for the "is_mmio" case it does "iounmap()". So the GENERIC_IOMAP code is actually really simple and should just work for pretty much everybody. All it requires is that fake kernel virtual address range at PIO_OFFSET (you can override the default values if you want - maybe your architecture really wants to put MMIO in those virtual addresses, but I don't think there's a lot of reason to generally want to do it) But despite that, people think they should implement their own code, and then they clearly get it HORRIBLY WRONG. Anyway, this email ended up being a long explanation of what the code _should_ do, in the hope that some enterprising kernel developer decides "Oh, this sounds like an easy thing to fix". But you do need to be able to test the end result at least a tiny bit. Because I suspect that the real fix for sparc64 is to just get rid of its broken non-GENERIC_IOMAP code, and just do "select GENERIC_IOMAP" And I don't think sparc64 is the only architecture that should go "Oh, I should just use GENERIC_IOMAP instead of implementing it badly by hand". Anyone? Linus