Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > > > Joerg,
> > > > 
> > > > Where do we stand on this series?  You had a concern that the heuristic
> > > > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > > > issue was actually from Bjorn who replied with his rationale.  Do you
> > > > want to go in the direction of a fixed whitelist or do you agree that
> > > > even if the heuristic breaks it provides better behavior than what we
> > > > have now?  Thanks,
> > > 
> > > I'm trying to take a step back and look at the overall design, not
> > > these specific patches.
> > > 
> > > IOMMUs translate addresses based on their source, i.e., a PCIe
> > > requester ID.  This is made more complicated by the fact that some
> > > bridges "take ownership" (change the requester ID as they forward
> > > transactions upstream), as well as the fact that conventional PCI has
> > > no requester ID at all.  And some broken devices apparently generate
> > > DMA requests using the requester ID of another device.
> > > 
> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
> > > and pci_get_dma_source() interfaces, but I think there's too much
> > > assembly required by their users.  pci_find_upstream_pcie_bridge()
> > > callers normally loop through all the bridges between the "upstream
> > > PCIe bridge" and the device, checking for bridges that might take
> > > ownership.  They probably also ought to use pci_get_dma_source() to
> > > account for the broken devices, but most callers don't.
> > > 
> > > Most of this is PCI-specific stuff that should be of interest to all
> > > IOMMU drivers, and the overall structure of calls and looping should
> > > be the same for all of them, so it would be nice to factor it out
> > > somehow.
> > > 
> > > The attached patch is guaranteed not to even compile; it's just to
> > > make this idea more concrete.  The basic idea is that since the IOMMU
> > > driver wants to perform some action for each possible requester ID the
> > > IOMMU might see, PCI could provide an iterator
> > > ("pci_for_each_requester_id()") to do that.
> > > 
> > > Bjorn
> > > 
> > > 
> > > commit afad51492c6672b96c2b0735600d5695e30f7180
> > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > Date:   Wed Jul 3 16:04:26 2013 -0600
> > > 
> > >     pci-add-for-each-requester-id
> > > 
> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > > index d0627fa..380eb03 100644
> > > --- a/drivers/pci/search.c
> > > +++ b/drivers/pci/search.c
> > > @@ -17,6 +17,89 @@
> > >  DECLARE_RWSEM(pci_bus_sem);
> > >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> > >  
> > > +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> > > +
> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
> > > +{
> > > +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> > > +}
> > > +
> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> > > +{
> > > +	/*
> > > +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> > > +	 * Spec v1.0, sec 2.3.
> > > +	 */
> > > +	if (pci_is_pcie(bridge) &&
> > > +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> > > +		return true;
> > 
> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
> > capability are exactly what causes the current bug.
> 
> Likely so.  We might be able to identify that as "secondary bus is not
> PCIe" or something.
> 
> > > +
> > > +	/*
> > > +	 * A PCI-X to PCI-X bridge need not take ownership because there
> > > +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> > > +	 * device is added on the secondary bus, the bridge must revert to
> > > +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> > > +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> > > +	 * tolerate a future hot-add without having to change existing
> > > +	 * IOMMU mappings.
> > > +	 */
> > > +	if (pci_is_pcix(bridge))
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> > > +					 struct pci_dev *dev)
> > > +{
> > > +	struct pci_dev *bridge;
> > > +	struct pci_bus *child_bus;
> > > +	u8 secondary, subordinate, busn = dev->bus->number;
> > > +
> > > +	if (dev->bus == bus)
> > > +		return dev;
> > > +
> > > +	/*
> > > +	 * There may be several devices on "bus".  Find the one that is a
> > > +	 * bridge leading to "dev".
> > > +	 */
> > > +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> > > +		child_bus = bridge->subordinate;
> > > +		if (child_bus) {
> > > +			secondary = child_bus->busn_res.start;
> > > +			subordinate = child_bus->busn_res.end;
> > > +			if (secondary <= busn && busn <= subordinate)
> > > +				return bridge;
> > > +		}
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			      int (*fn)(struct pci_dev *, void *),
> > 
> > u16 requester_id
> > 
> > > +			      void *data)
> > > +{
> > > +	int ret;
> > > +
> > > +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
> > 
> > It's unfortunate that our dma quirk still seems to only get called for
> > the end device where we could have bridges that need it too.
> 
> Sure, we can do that if necessary.
> 
> I'm not trying to present this as any kind of finished solution.  The
> point is to figure out if we can make an interface that's a little
> more abstract so we can pull the PCI topology issues out of the IOMMU
> drivers, and I just thought something that looked like code would be
> more concrete and easier to discuss.
> 
> > > +
> > > +	while (bridge != dev) {
> > 
> > 
> > I'm having a hard time understanding this function since bridge is never
> > defined in any of the below sample code.  Would bridge be the root port
> > above a device?  And what about devices connected to the RC?  Seems like
> > we'd need to traverse up the bus first, so there's a bit of missing
> > magic.
> 
> I was assuming that an IOMMU could be either at the root of a PCI
> hierarchy or buried farther down.  If that's true, the IOMMU driver
> might need to supply the top of the PCI hierarchy for which it
> translates.  That's what I'm trying to get at with the "bridge"
> argument.
> 
> If the IOMMU is buried, I assumed there was no point in iterating over
> bridges upstream of the IOMMU because it wouldn't see requester IDs of
> those upstream bridges.  PCI wouldn't have any way to figure out how
> far up to go, hence the argument.

I don't think that getting the bridge is going to be quite so simple.
VT-d can have multiple IOMMUs, each handling a specific device or
hierarchy.  One IOMMU per domain can be default IOMMU for any device not
otherwise specified.  The IOMMU is not a PCI enumerable device.  AMD-Vi
does have a PCI visible IOMMU, but it's an endpoint on the root complex,
not the top of a hierarchy.

I'm warming up to your approach, but I think I'd rather see something
like:

struct pci_dev *pci_get_requester_id(struct pci_dev *pdev)
{
	do {
		/* Apply DMA quirks at every step */
		pdev = pci_get_dma_source(pdev);

		/* Done for PCIe devices or broken bridges, PCI-X TBD */
		if (pci_is_pcie(pdev) || /* broken pcie bridge test */)
			break;

	} while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self));

	return pdev;
}


/* Start with requester ID pdev and pdev->bus, this way we handle both peer
 * and parent requester IDs */
int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus,
				     int (*fn)(struct pci_dev *, void *), void *data)
{
	struct pci_dev *tmp;
	int ret;

	list_for_each_entry(tmp, &bus->devices, bus_list) {
		if (pci_get_requester_id(tmp) == pdev) {
			ret = fn(tmp, data);
			if (ret)
				...

			if (tmp->subordinate) {
				ret = pci_for_each_requester_id(pdev, tmp->subordinate, fn);
				if (ret)
					...
			}
		}
	}
	
	return ret;
}

Seems like that would be sufficient to leverage the intel-iommu changes
you've drafted.  Thanks,

Alex

> > > +		if (pci_bridge_may_take_ownership(bridge)) {
> > > +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > > +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> > > +	}
> > > +
> > > +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> > > +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> > > +
> > > +	/* Conventional PCI has no requester ID */
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * find the upstream PCIe-to-PCI bridge of a PCI device
> > >   * if the device is PCIE, return NULL
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 3a24e4f..cbcc82f 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
> > >  }
> > >  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
> > >  int pci_dev_present(const struct pci_device_id *ids);
> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			       int (*fn)(struct pci_dev *, void *), void *data);
> > >  
> > >  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
> > >  			     int where, u8 *val);
> > > 
> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > Date:   Wed Jul 3 12:02:15 2013 -0600
> > > 
> > >     intel-iommu-upstream-pcie
> > 
> > I assume this one would is also going to apply dma quirks to dma_ops,
> > which I'm all for.
> > 
> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > index b91564d..bdb6e6a 100644
> > > --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
> > >  	return 0;
> > >  }
> > >  
> > > +struct domain_context_info {
> > > +	struct dmar_domain *domain;
> > > +	struct pci_dev *dev;
> > > +	int translation;
> > > +	struct intel_iommu *iomumu;
> > 
> > s/iomumu/iommu/
> > 
> > > +	bool mapped;
> > > +};
> > > +
> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > 
> > dev_info->dev
> > 
> > > +	u8 bus = data->requester_id >> 8;
> > 
> > s/data->//
> > 
> > > +	u8 devfn = data->requester_id & 0xff;
> > 
> > s/data->//
> 
> I told you it wouldn't even compile :)  I'm only trying to present a
> possibility, not all the nitty-gritty details.
> 
> > > +
> > > +	return domain_context_mapping_one(data->domain, segment,
> > > +					  bus, devfn, data->translation);
> > > +}
> > > +
> > >  static int domain_context_mapping(struct dmar_domain *domain,
> > >  				  struct pci_dev *pdev, int translation)
> > >  {
> > > -	int ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > > +	struct pci_dev *bridge;
> > >  
> > > -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> > > -					 pdev->bus->number, pdev->devfn,
> > > -					 translation);
> > > -	if (ret)
> > > -		return ret;
> > > +	data.domain = domain;
> > > +	data.dev = pdev;
> > > +	data.translation = translation;
> > > +	bridge = xx;
> > 
> > Darn, I was hoping you were going to reveal where bridge comes from.
> > 
> > > +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> > > +}
> > >  
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return 0;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = domain_context_mapping_one(domain,
> > > -						 pci_domain_nr(parent->bus),
> > > -						 parent->bus->number,
> > > -						 parent->devfn, translation);
> > > -		if (ret)
> > > -			return ret;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -					pci_domain_nr(tmp->subordinate),
> > > -					tmp->subordinate->number, 0,
> > > -					translation);
> > > -	else /* this is a legacy PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -						  pci_domain_nr(tmp->bus),
> > > -						  tmp->bus->number,
> > > -						  tmp->devfn,
> > > -						  translation);
> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	bool ret;
> > > +
> > > +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> > > +	if (!ret)
> > > +		dev_info->mapped = false;
> > > +
> > > +	return 0;
> > >  }
> > >  
> > >  static bool domain_context_mapped(struct pci_dev *pdev)
> > >  {
> > > -	bool ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  	struct intel_iommu *iommu;
> > > +	struct pci_dev *bridge;
> > >  
> > >  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> > >  				pdev->devfn);
> > >  	if (!iommu)
> > >  		return false;
> > >  
> > > -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> > > -	if (!ret)
> > > -		return false;
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return true;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = device_context_mapped(iommu, parent->bus->number,
> > > -					    parent->devfn);
> > > -		if (!ret)
> > > -			return false;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp))
> > > -		return device_context_mapped(iommu, tmp->subordinate->number,
> > > -					     0);
> > > -	else
> > > -		return device_context_mapped(iommu, tmp->bus->number,
> > > -					     tmp->devfn);
> > > +	data.iommu = iommu;
> > > +	data.mapped = true;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> > > +	return data.mapped;
> > >  }
> > >  
> > >  /* Returns a number of VTD pages, but aligned to MM page size */
> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
> > >  	return NULL;
> > >  }
> > >  
> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	unsigned long flags;
> > > +	struct device_domain_info *info;
> > > +
> > > +	spin_lock_irqsave(&device_domain_lock, flags);
> > > +	list_for_each_entry(info, &device_domain_list, global) {
> > > +		if (info->segment == segment &&
> > > +		    info->bus == bus && info->devfn == devfn) {
> > > +			data->domain = info->domain;
> > > +			break;
> > > +		}
> > > +	}
> > > +	spin_unlock_irqrestore(&device_domain_lock, flags);
> > > +	return 0;
> > > +}
> > > +
> > >  /* domain is initialized */
> > >  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  {
> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  	struct intel_iommu *iommu;
> > >  	struct dmar_drhd_unit *drhd;
> > >  	struct device_domain_info *info, *tmp;
> > > -	struct pci_dev *dev_tmp;
> > >  	unsigned long flags;
> > >  	int bus = 0, devfn = 0;
> > >  	int segment;
> > >  	int ret;
> > > +	struct domain_context_info data;
> > >  
> > >  	domain = find_domain(pdev);
> > >  	if (domain)
> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  
> > >  	segment = pci_domain_nr(pdev->bus);
> > >  
> > > -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (dev_tmp) {
> > > -		if (pci_is_pcie(dev_tmp)) {
> > > -			bus = dev_tmp->subordinate->number;
> > > -			devfn = 0;
> > > -		} else {
> > > -			bus = dev_tmp->bus->number;
> > > -			devfn = dev_tmp->devfn;
> > > -		}
> > > -		spin_lock_irqsave(&device_domain_lock, flags);
> > > -		list_for_each_entry(info, &device_domain_list, global) {
> > > -			if (info->segment == segment &&
> > > -			    info->bus == bus && info->devfn == devfn) {
> > > -				found = info->domain;
> > > -				break;
> > > -			}
> > > -		}
> > > -		spin_unlock_irqrestore(&device_domain_lock, flags);
> > > -		/* pcie-pci bridge already has a domain, uses it */
> > > -		if (found) {
> > > -			domain = found;
> > > -			goto found_domain;
> > > -		}
> > > +	data.domain = NULL;
> > > +	data.dev = pdev;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> > > +	if (data.domain) {
> > > +		domain = data.domain;
> > > +		goto found_domain;
> > >  	}
> > >  
> > >  	domain = alloc_domain();
> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
> > >  	return 0;
> > >  }
> > >  
> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	struct intel_iommu *iommu = data->iommu;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +
> > > +	iommu_detach_dev(iommu, bus, devfn);
> > > +	return 0;
> > > +}
> > > +
> > >  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
> > >  					   struct pci_dev *pdev)
> > >  {
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  
> > >  	if (!iommu || !pdev)
> > >  		return;
> > >  
> > >  	/* dependent device detach */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	if (tmp) {
> > > -		parent = pdev->bus->self;
> > > -		while (parent != tmp) {
> > > -			iommu_detach_dev(iommu, parent->bus->number,
> > > -					 parent->devfn);
> > > -			parent = parent->bus->self;
> > > -		}
> > > -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -			iommu_detach_dev(iommu,
> > > -				tmp->subordinate->number, 0);
> > > -		else /* this is a legacy PCI bridge */
> > > -			iommu_detach_dev(iommu, tmp->bus->number,
> > > -					 tmp->devfn);
> > > -	}
> > > +	data.iommu = iommu;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
> > >  }
> > >  
> > >  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux