On 2021-05-01 11:35 p.m., John Hubbard wrote: > On 4/8/21 10:01 AM, Logan Gunthorpe wrote: >> In order to use upstream_bridge_distance_warn() from a dma_map function, >> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it >> might sleep. >> >> In order to avoid this, try to get the host bridge's device from >> bus->self, and if that is not set, just get the first element in the >> device list. It should be impossible for the host bridge's device to >> go away while references are held on child devices, so the first element >> should not be able to change and, thus, this should be safe. >> >> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> >> --- >> drivers/pci/p2pdma.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index bd89437faf06..473a08940fbc 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry { >> static bool __host_bridge_whitelist(struct pci_host_bridge *host, >> bool same_host_bridge) >> { >> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0)); >> const struct pci_p2pdma_whitelist_entry *entry; >> + struct pci_dev *root = host->bus->self; >> unsigned short vendor, device; >> >> + /* >> + * This makes the assumption that the first device on the bus is the >> + * bridge itself and it has the devfn of 00.0. This assumption should >> + * hold for the devices in the white list above, and if there are cases >> + * where this isn't true they will have to be dealt with when such a >> + * case is added to the whitelist. > > Actually, it makes the assumption that the first device *in the list* > (the host->bus-devices list) is 00.0. The previous code made the > assumption that you wrote. The comment notes two assumptions (although the grammar is poor, which I will fix). Yes, the previous code made the second assumption, the new code makes both assumptions. > By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems > really short. From a naive point of view, I'd expect that there must be > a lot more CPUs/chipsets that can do pci p2p, what do you think? I > wonder if we have to be so super strict, anyway. It just seems extremely > limited, and I suspect there will be some additions to the list as soon > as we start to use this. Yes, well unfortunately we have no other way to determine what host bridges can communicate with P2P. We settled on a whitelist when the series was first patch. Nobody likes that situation, but nobody has found anything better. We've been hoping standards bodies would give us a flag but I haven't heard anything about that. At least AMD has been able to guarantee us that all CPUs newer than Zen will support so that covers a large swath. It would be nice if we could say something similar for Intel. > OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. > Why is it OK to avoid taking any locks in order to retrieve the > first entry from the list, but in order to retrieve any other entry, you > have to aquire the pci_bus_sem, and get a reference as well? Something > is inconsistent there. > > The new version here also no longer takes a reference on the device, > which is also cheating. But I'm guessing that the unstated assumption > here is that there is always at least one entry in the list. But if > that's true, then it's better to show clearly that assumption, instead > of hiding it in an implicit call that skips both locking and reference > counting. Because we hold a reference to a child device of the bus. So the host bus device can't go away until the child device has been released. An earlier version of the P2PDMA patchset had a lot more extraneous get device calls until someone else pointed this out. > You could add a new function, which is a cut-down version of pci_get_slot(), > like this, and call this from __host_bridge_whitelist(): > > /* > * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem > * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is > * up, it is safe to call this, because there will always be a top-level PCI > * root device. > * > * Other assumptions: the root device is the first device in the list, and the > * root device is numbered 00.0. > */ > struct pci_dev *pci_get_root_slot(struct pci_bus *bus) > { > struct pci_dev *root; > unsigned devfn = PCI_DEVFN(0, 0); > > root = list_first_entry_or_null(&bus->devices, struct pci_dev, > bus_list); > if (root->devfn == devfn) > goto out; > > root = NULL; > out: > pci_dev_get(root); > return root; > } > EXPORT_SYMBOL(pci_get_root_slot); > > ...I think that's a lot clearer to the reader, about what's going on here. Per above, I think the reference count is unnecessary. But I could wrap it in a static function for clarity. (There's no reason to export this function). > Note that I'm not really sure if it *is* safe, I would need to ask other > PCIe subsystem developers with more experience. But I don't think anyone > is trying to make p2pdma calls so early that PCIe buses are uninitialized. Yeah, it's impossible to make a p2pdma call before the PCIe bus is initialized. They have to have access to at least one PCI device before they can even attempt it. Logan