On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote: > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote: > > Joerg, > > > > Where do we stand on this series? You had a concern that the heuristic > > used in patch 1/ could be dangerous. The suggestion for detecting the > > issue was actually from Bjorn who replied with his rationale. Do you > > want to go in the direction of a fixed whitelist or do you agree that > > even if the heuristic breaks it provides better behavior than what we > > have now? Thanks, > > I'm trying to take a step back and look at the overall design, not > these specific patches. > > IOMMUs translate addresses based on their source, i.e., a PCIe > requester ID. This is made more complicated by the fact that some > bridges "take ownership" (change the requester ID as they forward > transactions upstream), as well as the fact that conventional PCI has > no requester ID at all. And some broken devices apparently generate > DMA requests using the requester ID of another device. > > We currently deal with this using the pci_find_upstream_pcie_bridge() > and pci_get_dma_source() interfaces, but I think there's too much > assembly required by their users. pci_find_upstream_pcie_bridge() > callers normally loop through all the bridges between the "upstream > PCIe bridge" and the device, checking for bridges that might take > ownership. They probably also ought to use pci_get_dma_source() to > account for the broken devices, but most callers don't. > > Most of this is PCI-specific stuff that should be of interest to all > IOMMU drivers, and the overall structure of calls and looping should > be the same for all of them, so it would be nice to factor it out > somehow. > > The attached patch is guaranteed not to even compile; it's just to > make this idea more concrete. The basic idea is that since the IOMMU > driver wants to perform some action for each possible requester ID the > IOMMU might see, PCI could provide an iterator > ("pci_for_each_requester_id()") to do that. > > Bjorn > > > commit afad51492c6672b96c2b0735600d5695e30f7180 > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Date: Wed Jul 3 16:04:26 2013 -0600 > > pci-add-for-each-requester-id > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c > index d0627fa..380eb03 100644 > --- a/drivers/pci/search.c > +++ b/drivers/pci/search.c > @@ -17,6 +17,89 @@ > DECLARE_RWSEM(pci_bus_sem); > EXPORT_SYMBOL_GPL(pci_bus_sem); > > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn) > +#define PCI_BRIDGE_REQUESTER_ID(bridge) ((bridge)->subordinate->number << 8) > + > +static inline bool pci_is_pcix(struct pci_dev *dev) > +{ > + return !!pci_pcix_cap(dev); /* XXX not implemented */ > +} > + > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge) > +{ > + /* > + * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge > + * Spec v1.0, sec 2.3. > + */ > + if (pci_is_pcie(bridge) && > + pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE) > + return true; Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe capability are exactly what causes the current bug. > + > + /* > + * A PCI-X to PCI-X bridge need not take ownership because there > + * are requester IDs on the secondary PCI-X bus. However, if a PCI > + * device is added on the secondary bus, the bridge must revert to > + * being a PCI-X to PCI bridge, and it then *would* take ownership. > + * Assuming a PCI-X to PCI-X bridge takes ownership means we can > + * tolerate a future hot-add without having to change existing > + * IOMMU mappings. > + */ > + if (pci_is_pcix(bridge)) > + return true; > + > + return false; > +} > + > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus, > + struct pci_dev *dev) > +{ > + struct pci_dev *bridge; > + struct pci_bus *child_bus; > + u8 secondary, subordinate, busn = dev->bus->number; > + > + if (dev->bus == bus) > + return dev; > + > + /* > + * There may be several devices on "bus". Find the one that is a > + * bridge leading to "dev". > + */ > + list_for_each_entry(bridge, &bus->devices, bus_list) { > + child_bus = bridge->subordinate; > + if (child_bus) { > + secondary = child_bus->busn_res.start; > + subordinate = child_bus->busn_res.end; > + if (secondary <= busn && busn <= subordinate) > + return bridge; > + } > + } > + return NULL; > +} > + > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev, > + int (*fn)(struct pci_dev *, void *), u16 requester_id > + void *data) > +{ > + int ret; > + > + dev = pci_get_dma_source(dev); /* XXX ref count screwup */ It's unfortunate that our dma quirk still seems to only get called for the end device where we could have bridges that need it too. > + > + while (bridge != dev) { I'm having a hard time understanding this function since bridge is never defined in any of the below sample code. Would bridge be the root port above a device? And what about devices connected to the RC? Seems like we'd need to traverse up the bus first, so there's a bit of missing magic. > + if (pci_bridge_may_take_ownership(bridge)) { > + ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data); > + if (ret) > + return ret; > + } > + bridge = pci_bridge_to_dev(bridge->subordinate, dev); > + } > + > + if (pci_is_pcie(dev) || pci_is_pcix(dev)) > + return fn(dev, PCI_REQUESTER_ID(dev), data); > + > + /* Conventional PCI has no requester ID */ > + return 0; > +} > + > /* > * find the upstream PCIe-to-PCI bridge of a PCI device > * if the device is PCIE, return NULL > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 3a24e4f..cbcc82f 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus, > } > struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from); > int pci_dev_present(const struct pci_device_id *ids); > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev, > + int (*fn)(struct pci_dev *, void *), void *data); > > int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn, > int where, u8 *val); > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Date: Wed Jul 3 12:02:15 2013 -0600 > > intel-iommu-upstream-pcie I assume this one would is also going to apply dma quirks to dma_ops, which I'm all for. > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index b91564d..bdb6e6a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment, > return 0; > } > > +struct domain_context_info { > + struct dmar_domain *domain; > + struct pci_dev *dev; > + int translation; > + struct intel_iommu *iomumu; s/iomumu/iommu/ > + bool mapped; > +}; > + > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data) > +{ > + struct domain_context_info *dev_info = data; > + int segment = pci_domain_nr(data->dev); dev_info->dev > + u8 bus = data->requester_id >> 8; s/data->// > + u8 devfn = data->requester_id & 0xff; s/data->// > + > + return domain_context_mapping_one(data->domain, segment, > + bus, devfn, data->translation); > +} > + > static int domain_context_mapping(struct dmar_domain *domain, > struct pci_dev *pdev, int translation) > { > - int ret; > - struct pci_dev *tmp, *parent; > + struct domain_context_info data; > + struct pci_dev *bridge; > > - ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus), > - pdev->bus->number, pdev->devfn, > - translation); > - if (ret) > - return ret; > + data.domain = domain; > + data.dev = pdev; > + data.translation = translation; > + bridge = xx; Darn, I was hoping you were going to reveal where bridge comes from. > + return pci_for_each_requester_id(bridge, pdev, context_mapping, &data); > +} > > - /* dependent device mapping */ > - tmp = pci_find_upstream_pcie_bridge(pdev); > - if (!tmp) > - return 0; > - /* Secondary interface's bus number and devfn 0 */ > - parent = pdev->bus->self; > - while (parent != tmp) { > - ret = domain_context_mapping_one(domain, > - pci_domain_nr(parent->bus), > - parent->bus->number, > - parent->devfn, translation); > - if (ret) > - return ret; > - parent = parent->bus->self; > - } > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ > - return domain_context_mapping_one(domain, > - pci_domain_nr(tmp->subordinate), > - tmp->subordinate->number, 0, > - translation); > - else /* this is a legacy PCI bridge */ > - return domain_context_mapping_one(domain, > - pci_domain_nr(tmp->bus), > - tmp->bus->number, > - tmp->devfn, > - translation); > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data) > +{ > + struct domain_context_info *dev_info = data; > + u8 bus = data->requester_id >> 8; > + u8 devfn = data->requester_id & 0xff; > + bool ret; > + > + ret = device_context_mapped(dev_info->iommu, bus, devfn); > + if (!ret) > + dev_info->mapped = false; > + > + return 0; > } > > static bool domain_context_mapped(struct pci_dev *pdev) > { > - bool ret; > - struct pci_dev *tmp, *parent; > + struct domain_context_info data; > struct intel_iommu *iommu; > + struct pci_dev *bridge; > > iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number, > pdev->devfn); > if (!iommu) > return false; > > - ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn); > - if (!ret) > - return false; > - /* dependent device mapping */ > - tmp = pci_find_upstream_pcie_bridge(pdev); > - if (!tmp) > - return true; > - /* Secondary interface's bus number and devfn 0 */ > - parent = pdev->bus->self; > - while (parent != tmp) { > - ret = device_context_mapped(iommu, parent->bus->number, > - parent->devfn); > - if (!ret) > - return false; > - parent = parent->bus->self; > - } > - if (pci_is_pcie(tmp)) > - return device_context_mapped(iommu, tmp->subordinate->number, > - 0); > - else > - return device_context_mapped(iommu, tmp->bus->number, > - tmp->devfn); > + data.iommu = iommu; > + data.mapped = true; > + bridge = xx; > + pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data); > + return data.mapped; > } > > /* Returns a number of VTD pages, but aligned to MM page size */ > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev) > return NULL; > } > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data) > +{ > + struct domain_context_info *dev_info = data; > + int segment = pci_domain_nr(data->dev); > + u8 bus = data->requester_id >> 8; > + u8 devfn = data->requester_id & 0xff; > + unsigned long flags; > + struct device_domain_info *info; > + > + spin_lock_irqsave(&device_domain_lock, flags); > + list_for_each_entry(info, &device_domain_list, global) { > + if (info->segment == segment && > + info->bus == bus && info->devfn == devfn) { > + data->domain = info->domain; > + break; > + } > + } > + spin_unlock_irqrestore(&device_domain_lock, flags); > + return 0; > +} > + > /* domain is initialized */ > static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) > { > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) > struct intel_iommu *iommu; > struct dmar_drhd_unit *drhd; > struct device_domain_info *info, *tmp; > - struct pci_dev *dev_tmp; > unsigned long flags; > int bus = 0, devfn = 0; > int segment; > int ret; > + struct domain_context_info data; > > domain = find_domain(pdev); > if (domain) > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw) > > segment = pci_domain_nr(pdev->bus); > > - dev_tmp = pci_find_upstream_pcie_bridge(pdev); > - if (dev_tmp) { > - if (pci_is_pcie(dev_tmp)) { > - bus = dev_tmp->subordinate->number; > - devfn = 0; > - } else { > - bus = dev_tmp->bus->number; > - devfn = dev_tmp->devfn; > - } > - spin_lock_irqsave(&device_domain_lock, flags); > - list_for_each_entry(info, &device_domain_list, global) { > - if (info->segment == segment && > - info->bus == bus && info->devfn == devfn) { > - found = info->domain; > - break; > - } > - } > - spin_unlock_irqrestore(&device_domain_lock, flags); > - /* pcie-pci bridge already has a domain, uses it */ > - if (found) { > - domain = found; > - goto found_domain; > - } > + data.domain = NULL; > + data.dev = pdev; > + bridge = xx; > + pci_for_each_requester_id(bridge, pdev, find_domain2, &data); > + if (data.domain) { > + domain = data.domain; > + goto found_domain; > } > > domain = alloc_domain(); > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void) > return 0; > } > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data) > +{ > + struct domain_context_info *dev_info = data; > + struct intel_iommu *iommu = data->iommu; > + u8 bus = data->requester_id >> 8; > + u8 devfn = data->requester_id & 0xff; > + > + iommu_detach_dev(iommu, bus, devfn); > + return 0; > +} > + > static void iommu_detach_dependent_devices(struct intel_iommu *iommu, > struct pci_dev *pdev) > { > - struct pci_dev *tmp, *parent; > + struct domain_context_info data; > > if (!iommu || !pdev) > return; > > /* dependent device detach */ > - tmp = pci_find_upstream_pcie_bridge(pdev); > - /* Secondary interface's bus number and devfn 0 */ > - if (tmp) { > - parent = pdev->bus->self; > - while (parent != tmp) { > - iommu_detach_dev(iommu, parent->bus->number, > - parent->devfn); > - parent = parent->bus->self; > - } > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */ > - iommu_detach_dev(iommu, > - tmp->subordinate->number, 0); > - else /* this is a legacy PCI bridge */ > - iommu_detach_dev(iommu, tmp->bus->number, > - tmp->devfn); > - } > + data.iommu = iommu; > + bridge = xx; > + pci_for_each_requester_id(bridge, pdev, detach_requester, &data); > } > > static void domain_remove_one_dev_info(struct dmar_domain *domain, > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html