On Tue, Jul 9, 2013 at 12:27 PM, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote: >> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote: >> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote: >> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote: >> > > > Joerg, >> > > > >> > > > Where do we stand on this series? You had a concern that the heuristic >> > > > used in patch 1/ could be dangerous. The suggestion for detecting the >> > > > issue was actually from Bjorn who replied with his rationale. Do you >> > > > want to go in the direction of a fixed whitelist or do you agree that >> > > > even if the heuristic breaks it provides better behavior than what we >> > > > have now? Thanks, >> > > >> > > I'm trying to take a step back and look at the overall design, not >> > > these specific patches. >> > > >> > > IOMMUs translate addresses based on their source, i.e., a PCIe >> > > requester ID. This is made more complicated by the fact that some >> > > bridges "take ownership" (change the requester ID as they forward >> > > transactions upstream), as well as the fact that conventional PCI has >> > > no requester ID at all. And some broken devices apparently generate >> > > DMA requests using the requester ID of another device. >> > > >> > > We currently deal with this using the pci_find_upstream_pcie_bridge() >> > > and pci_get_dma_source() interfaces, but I think there's too much >> > > assembly required by their users. pci_find_upstream_pcie_bridge() >> > > callers normally loop through all the bridges between the "upstream >> > > PCIe bridge" and the device, checking for bridges that might take >> > > ownership. They probably also ought to use pci_get_dma_source() to >> > > account for the broken devices, but most callers don't. >> > > >> > > Most of this is PCI-specific stuff that should be of interest to all >> > > IOMMU drivers, and the overall structure of calls and looping should >> > > be the same for all of them, so it would be nice to factor it out >> > > somehow. >> > > >> > > The attached patch is guaranteed not to even compile; it's just to >> > > make this idea more concrete. The basic idea is that since the IOMMU >> > > driver wants to perform some action for each possible requester ID the >> > > IOMMU might see, PCI could provide an iterator >> > > ("pci_for_each_requester_id()") to do that. >> > > >> > > Bjorn >> > > >> > > >> > > commit afad51492c6672b96c2b0735600d5695e30f7180 >> > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> > > Date: Wed Jul 3 16:04:26 2013 -0600 >> > > >> > > pci-add-for-each-requester-id >> > > >> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c >> > > index d0627fa..380eb03 100644 >> > > --- a/drivers/pci/search.c >> > > +++ b/drivers/pci/search.c >> > > @@ -17,6 +17,89 @@ >> > > DECLARE_RWSEM(pci_bus_sem); >> > > EXPORT_SYMBOL_GPL(pci_bus_sem); >> > > >> > > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) >> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge) ((bridge)->subordinate->number << 8) >> > > + >> > > +static inline bool pci_is_pcix(struct pci_dev *dev) >> > > +{ >> > > + return !!pci_pcix_cap(dev); /* XXX not implemented */ >> > > +} >> > > + >> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge) >> > > +{ >> > > + /* >> > > + * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge >> > > + * Spec v1.0, sec 2.3. >> > > + */ >> > > + if (pci_is_pcie(bridge) && >> > > + pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE) >> > > + return true; >> > >> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe >> > capability are exactly what causes the current bug. >> >> Likely so. We might be able to identify that as "secondary bus is not >> PCIe" or something. >> >> > > + >> > > + /* >> > > + * A PCI-X to PCI-X bridge need not take ownership because there >> > > + * are requester IDs on the secondary PCI-X bus. However, if a PCI >> > > + * device is added on the secondary bus, the bridge must revert to >> > > + * being a PCI-X to PCI bridge, and it then *would* take ownership. >> > > + * Assuming a PCI-X to PCI-X bridge takes ownership means we can >> > > + * tolerate a future hot-add without having to change existing >> > > + * IOMMU mappings. >> > > + */ >> > > + if (pci_is_pcix(bridge)) >> > > + return true; >> > > + >> > > + return false; >> > > +} >> > > + >> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus, >> > > + struct pci_dev *dev) >> > > +{ >> > > + struct pci_dev *bridge; >> > > + struct pci_bus *child_bus; >> > > + u8 secondary, subordinate, busn = dev->bus->number; >> > > + >> > > + if (dev->bus == bus) >> > > + return dev; >> > > + >> > > + /* >> > > + * There may be several devices on "bus". Find the one that is a >> > > + * bridge leading to "dev". >> > > + */ >> > > + list_for_each_entry(bridge, &bus->devices, bus_list) { >> > > + child_bus = bridge->subordinate; >> > > + if (child_bus) { >> > > + secondary = child_bus->busn_res.start; >> > > + subordinate = child_bus->busn_res.end; >> > > + if (secondary <= busn && busn <= subordinate) >> > > + return bridge; >> > > + } >> > > + } >> > > + return NULL; >> > > +} >> > > + >> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev, >> > > + int (*fn)(struct pci_dev *, void *), >> > >> > u16 requester_id >> > >> > > + void *data) >> > > +{ >> > > + int ret; >> > > + >> > > + dev = pci_get_dma_source(dev); /* XXX ref count screwup */ >> > >> > It's unfortunate that our dma quirk still seems to only get called for >> > the end device where we could have bridges that need it too. >> >> Sure, we can do that if necessary. >> >> I'm not trying to present this as any kind of finished solution. The >> point is to figure out if we can make an interface that's a little >> more abstract so we can pull the PCI topology issues out of the IOMMU >> drivers, and I just thought something that looked like code would be >> more concrete and easier to discuss. >> >> > > + >> > > + while (bridge != dev) { >> > >> > >> > I'm having a hard time understanding this function since bridge is never >> > defined in any of the below sample code. Would bridge be the root port >> > above a device? And what about devices connected to the RC? Seems like >> > we'd need to traverse up the bus first, so there's a bit of missing >> > magic. >> >> I was assuming that an IOMMU could be either at the root of a PCI >> hierarchy or buried farther down. If that's true, the IOMMU driver >> might need to supply the top of the PCI hierarchy for which it >> translates. That's what I'm trying to get at with the "bridge" >> argument. >> >> If the IOMMU is buried, I assumed there was no point in iterating over >> bridges upstream of the IOMMU because it wouldn't see requester IDs of >> those upstream bridges. PCI wouldn't have any way to figure out how >> far up to go, hence the argument. > > I don't think that getting the bridge is going to be quite so simple. > VT-d can have multiple IOMMUs, each handling a specific device or > hierarchy. One IOMMU per domain can be default IOMMU for any device not > otherwise specified. The IOMMU is not a PCI enumerable device. AMD-Vi > does have a PCI visible IOMMU, but it's an endpoint on the root complex, > not the top of a hierarchy. If the only possible topology is that the IOMMU sees transactions coming out of a device on a root bus, then a bridge parameter is unnecessary, and we can just look at every bridge between the root bus and the device. I just didn't know whether that was the only possible topology. If an IOMMU can be placed such that it's like a PCIe switch that happens to translate DMA addresses before forwarding upstream, then we might want a bridge parameter that identifies the "downstream port" of the IOMMU or the upstream port of a switch below the IOMMU. Otherwise we would generate requester IDs unnecessarily for bridges above the IOMMU. > I'm warming up to your approach, but I think I'd rather see something > like: > > struct pci_dev *pci_get_requester_id(struct pci_dev *pdev) This returns a "pci_dev *", not a requester ID, so for bridges that can take ownership, this depends on there being a pci_dev on the secondary bus with dev/fn == 0. Maybe that's always true; I'm not sure. > { > do { > /* Apply DMA quirks at every step */ > pdev = pci_get_dma_source(pdev); > > /* Done for PCIe devices or broken bridges, PCI-X TBD */ > if (pci_is_pcie(pdev) || /* broken pcie bridge test */) > break; > > } while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self)); > > return pdev; I'm not sure it's sufficient to stop when we find a PCIe bridge. For example, in this topology: 00:1c.0 PCIe root port 01:00.0 PCIe-to-PCI bridge 02:00.0 PCI-to-PCIe reverse bridge 03:00.0 PCIe switch upstream port 04:00.0 PCIe switch downstream port 05:00.0 PCIe endpoint the IOMMU will only see requester IDs of 02:00.0, even though if we search up from 05:00.0 we find a PCIe device at 04:00.0. Maybe this is too contrived, but reverse bridges are sometimes used on add-in cards so a new part can be used in older systems, e.g., in this case the add-in card would contain 02:00.0 and everything below it. > } > > > /* Start with requester ID pdev and pdev->bus, this way we handle both peer > * and parent requester IDs */ > int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus, > int (*fn)(struct pci_dev *, void *), void *data) > { > struct pci_dev *tmp; > int ret; > > list_for_each_entry(tmp, &bus->devices, bus_list) { > if (pci_get_requester_id(tmp) == pdev) { > ret = fn(tmp, data); What does the IOMMU driver supply for "bus"? Presumably it doesn't supply "pdev->bus" because you're traversing downward ("tmp->subordinate" below). > if (ret) > ... > > if (tmp->subordinate) { > ret = pci_for_each_requester_id(pdev, tmp->subordinate, fn); > if (ret) > ... > } > } > } > > return ret; > } > > Seems like that would be sufficient to leverage the intel-iommu changes > you've drafted. Thanks, > > Alex > >> > > + if (pci_bridge_may_take_ownership(bridge)) { >> > > + ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data); >> > > + if (ret) >> > > + return ret; >> > > + } >> > > + bridge = pci_bridge_to_dev(bridge->subordinate, dev); >> > > + } >> > > + >> > > + if (pci_is_pcie(dev) || pci_is_pcix(dev)) >> > > + return fn(dev, PCI_REQUESTER_ID(dev), data); >> > > + >> > > + /* Conventional PCI has no requester ID */ >> > > + return 0; >> > > +} >> > > + >> > > /* >> > > * find the upstream PCIe-to-PCI bridge of a PCI device >> > > * if the device is PCIE, return NULL >> > > diff --git a/include/linux/pci.h b/include/linux/pci.h >> > > index 3a24e4f..cbcc82f 100644 >> > > --- a/include/linux/pci.h >> > > +++ b/include/linux/pci.h >> > > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus, >> > > } >> > > struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from); >> > > int pci_dev_present(const struct pci_device_id *ids); >> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev, >> > > + int (*fn)(struct pci_dev *, void *), void *data); >> > > >> > > int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn, >> > > int where, u8 *val); >> > > >> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e >> > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> > > Date: Wed Jul 3 12:02:15 2013 -0600 >> > > >> > > intel-iommu-upstream-pcie >> > >> > I assume this one would is also going to apply dma quirks to dma_ops, >> > which I'm all for. >> > >> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >> > > index b91564d..bdb6e6a 100644 >> > > --- a/drivers/iommu/intel-iommu.c >> > > +++ b/drivers/iommu/intel-iommu.c >> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment, >> > > return 0; >> > > } >> > > >> > > +struct domain_context_info { >> > > + struct dmar_domain *domain; >> > > + struct pci_dev *dev; >> > > + int translation; >> > > + struct intel_iommu *iomumu; >> > >> > s/iomumu/iommu/ >> > >> > > + bool mapped; >> > > +}; >> > > + >> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data) >> > > +{ >> > > + struct domain_context_info *dev_info = data; >> > > + int segment = pci_domain_nr(data->dev); >> > >> > dev_info->dev >> > >> > > + u8 bus = data->requester_id >> 8; >> > >> > s/data->// >> > >> > > + u8 devfn = data->requester_id & 0xff; >> > >> > s/data->// >> >> I told you it wouldn't even compile :) I'm only trying to present a >> possibility, not all the nitty-gritty details. >> >> > > + >> > > + return domain_context_mapping_one(data->domain, segment, >> > > + bus, devfn, data->translation); >> > > +} >> > > + >> > > static int domain_context_mapping(struct dmar_domain *domain, >> > > struct pci_dev *pdev, int translation) >> > > { >> > > - int ret; >> > > - struct pci_dev *tmp, *parent; >> > > + struct domain_context_info data; >> > > + struct pci_dev *bridge; >> > > >> > > - ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus), >> > > - pdev->bus->number, pdev->devfn, >> > > - translation); >> > > - if (ret) >> > > - return ret; >> > > + data.domain = domain; >> > > + data.dev = pdev; >> > > + data.translation = translation; >> > > + bridge = xx; >> > >> > Darn, I was hoping you were going to reveal where bridge comes from. >> > >> > > + return pci_for_each_requester_id(bridge, pdev, context_mapping, &data); >> > > +} >> > > >> > > - /* dependent device mapping */ >> > > - tmp = pci_find_upstream_pcie_bridge(pdev); >> > > - if (!tmp) >> > > - return 0; >> > > - /* Secondary interface's bus number and devfn 0 */ >> > > - parent = pdev->bus->self; >> > > - while (parent != tmp) { >> > > - ret = domain_context_mapping_one(domain, >> > > - pci_domain_nr(parent->bus), >> > > - parent->bus->number, >> > > - parent->devfn, translation); >> > > - if (ret) >> > > - return ret; >> > > - parent = parent->bus->self; >> > > - } >> > > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ >> > > - return domain_context_mapping_one(domain, >> > > - pci_domain_nr(tmp->subordinate), >> > > - tmp->subordinate->number, 0, >> > > - translation); >> > > - else /* this is a legacy PCI bridge */ >> > > - return domain_context_mapping_one(domain, >> > > - pci_domain_nr(tmp->bus), >> > > - tmp->bus->number, >> > > - tmp->devfn, >> > > - translation); >> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data) >> > > +{ >> > > + struct domain_context_info *dev_info = data; >> > > + u8 bus = data->requester_id >> 8; >> > > + u8 devfn = data->requester_id & 0xff; >> > > + bool ret; >> > > + >> > > + ret = device_context_mapped(dev_info->iommu, bus, devfn); >> > > + if (!ret) >> > > + dev_info->mapped = false; >> > > + >> > > + return 0; >> > > } >> > > >> > > static bool domain_context_mapped(struct pci_dev *pdev) >> > > { >> > > - bool ret; >> > > - struct pci_dev *tmp, *parent; >> > > + struct domain_context_info data; >> > > struct intel_iommu *iommu; >> > > + struct pci_dev *bridge; >> > > >> > > iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number, >> > > pdev->devfn); >> > > if (!iommu) >> > > return false; >> > > >> > > - ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn); >> > > - if (!ret) >> > > - return false; >> > > - /* dependent device mapping */ >> > > - tmp = pci_find_upstream_pcie_bridge(pdev); >> > > - if (!tmp) >> > > - return true; >> > > - /* Secondary interface's bus number and devfn 0 */ >> > > - parent = pdev->bus->self; >> > > - while (parent != tmp) { >> > > - ret = device_context_mapped(iommu, parent->bus->number, >> > > - parent->devfn); >> > > - if (!ret) >> > > - return false; >> > > - parent = parent->bus->self; >> > > - } >> > > - if (pci_is_pcie(tmp)) >> > > - return device_context_mapped(iommu, tmp->subordinate->number, >> > > - 0); >> > > - else >> > > - return device_context_mapped(iommu, tmp->bus->number, >> > > - tmp->devfn); >> > > + data.iommu = iommu; >> > > + data.mapped = true; >> > > + bridge = xx; >> > > + pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data); >> > > + return data.mapped; >> > > } >> > > >> > > /* Returns a number of VTD pages, but aligned to MM page size */ >> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev) >> > > return NULL; >> > > } >> > > >> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data) >> > > +{ >> > > + struct domain_context_info *dev_info = data; >> > > + int segment = pci_domain_nr(data->dev); >> > > + u8 bus = data->requester_id >> 8; >> > > + u8 devfn = data->requester_id & 0xff; >> > > + unsigned long flags; >> > > + struct device_domain_info *info; >> > > + >> > > + spin_lock_irqsave(&device_domain_lock, flags); >> > > + list_for_each_entry(info, &device_domain_list, global) { >> > > + if (info->segment == segment && >> > > + info->bus == bus && info->devfn == devfn) { >> > > + data->domain = info->domain; >> > > + break; >> > > + } >> > > + } >> > > + spin_unlock_irqrestore(&device_domain_lock, flags); >> > > + return 0; >> > > +} >> > > + >> > > /* domain is initialized */ >> > > static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) >> > > { >> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) >> > > struct intel_iommu *iommu; >> > > struct dmar_drhd_unit *drhd; >> > > struct device_domain_info *info, *tmp; >> > > - struct pci_dev *dev_tmp; >> > > unsigned long flags; >> > > int bus = 0, devfn = 0; >> > > int segment; >> > > int ret; >> > > + struct domain_context_info data; >> > > >> > > domain = find_domain(pdev); >> > > if (domain) >> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) >> > > >> > > segment = pci_domain_nr(pdev->bus); >> > > >> > > - dev_tmp = pci_find_upstream_pcie_bridge(pdev); >> > > - if (dev_tmp) { >> > > - if (pci_is_pcie(dev_tmp)) { >> > > - bus = dev_tmp->subordinate->number; >> > > - devfn = 0; >> > > - } else { >> > > - bus = dev_tmp->bus->number; >> > > - devfn = dev_tmp->devfn; >> > > - } >> > > - spin_lock_irqsave(&device_domain_lock, flags); >> > > - list_for_each_entry(info, &device_domain_list, global) { >> > > - if (info->segment == segment && >> > > - info->bus == bus && info->devfn == devfn) { >> > > - found = info->domain; >> > > - break; >> > > - } >> > > - } >> > > - spin_unlock_irqrestore(&device_domain_lock, flags); >> > > - /* pcie-pci bridge already has a domain, uses it */ >> > > - if (found) { >> > > - domain = found; >> > > - goto found_domain; >> > > - } >> > > + data.domain = NULL; >> > > + data.dev = pdev; >> > > + bridge = xx; >> > > + pci_for_each_requester_id(bridge, pdev, find_domain2, &data); >> > > + if (data.domain) { >> > > + domain = data.domain; >> > > + goto found_domain; >> > > } >> > > >> > > domain = alloc_domain(); >> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void) >> > > return 0; >> > > } >> > > >> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data) >> > > +{ >> > > + struct domain_context_info *dev_info = data; >> > > + struct intel_iommu *iommu = data->iommu; >> > > + u8 bus = data->requester_id >> 8; >> > > + u8 devfn = data->requester_id & 0xff; >> > > + >> > > + iommu_detach_dev(iommu, bus, devfn); >> > > + return 0; >> > > +} >> > > + >> > > static void iommu_detach_dependent_devices(struct intel_iommu *iommu, >> > > struct pci_dev *pdev) >> > > { >> > > - struct pci_dev *tmp, *parent; >> > > + struct domain_context_info data; >> > > >> > > if (!iommu || !pdev) >> > > return; >> > > >> > > /* dependent device detach */ >> > > - tmp = pci_find_upstream_pcie_bridge(pdev); >> > > - /* Secondary interface's bus number and devfn 0 */ >> > > - if (tmp) { >> > > - parent = pdev->bus->self; >> > > - while (parent != tmp) { >> > > - iommu_detach_dev(iommu, parent->bus->number, >> > > - parent->devfn); >> > > - parent = parent->bus->self; >> > > - } >> > > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ >> > > - iommu_detach_dev(iommu, >> > > - tmp->subordinate->number, 0); >> > > - else /* this is a legacy PCI bridge */ >> > > - iommu_detach_dev(iommu, tmp->bus->number, >> > > - tmp->devfn); >> > > - } >> > > + data.iommu = iommu; >> > > + bridge = xx; >> > > + pci_for_each_requester_id(bridge, pdev, detach_requester, &data); >> > > } >> > > >> > > static void domain_remove_one_dev_info(struct dmar_domain *domain, >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> > > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html