Hi Robin, On 07/12/2016 19:24, Robin Murphy wrote: > On 07/12/16 15:02, Auger Eric wrote: >> Hi Robin, >> On 06/12/2016 19:55, Robin Murphy wrote: >>> On 15/11/16 13:09, Eric Auger wrote: >>>> The get() populates the list with the PCI host bridge windows >>>> and the MSI IOVA range. >>>> >>>> At the moment an arbitray MSI IOVA window is set at 0x8000000 >>>> of size 1MB. This will allow to report those info in iommu-group >>>> sysfs? >> >> >> First thank you for reviewing the series. This is definitively helpful! >>>> >>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>>> >>>> --- >>>> >>>> RFC v2 -> v3: >>>> - use existing get/put_resv_regions >>>> >>>> RFC v1 -> v2: >>>> - use defines for MSI IOVA base and length >>>> --- >>>> drivers/iommu/arm-smmu.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 52 insertions(+) >>>> >>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >>>> index 8f72814..81f1a83 100644 >>>> --- a/drivers/iommu/arm-smmu.c >>>> +++ b/drivers/iommu/arm-smmu.c >>>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg { >>>> >>>> #define FSYNR0_WNR (1 << 4) >>>> >>>> +#define MSI_IOVA_BASE 0x8000000 >>>> +#define MSI_IOVA_LENGTH 0x100000 >>>> + >>>> static int force_stage; >>>> module_param(force_stage, int, S_IRUGO); >>>> MODULE_PARM_DESC(force_stage, >>>> @@ -1545,6 +1548,53 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) >>>> return iommu_fwspec_add_ids(dev, &fwid, 1); >>>> } >>>> >>>> +static void arm_smmu_get_resv_regions(struct device *dev, >>>> + struct list_head *head) >>>> +{ >>>> + struct iommu_resv_region *region; >>>> + struct pci_host_bridge *bridge; >>>> + struct resource_entry *window; >>>> + >>>> + /* MSI region */ >>>> + region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, >>>> + IOMMU_RESV_MSI); >>>> + if (!region) >>>> + return; >>>> + >>>> + list_add_tail(®ion->list, head); >>>> + >>>> + if (!dev_is_pci(dev)) >>>> + return; >>>> + >>>> + bridge = pci_find_host_bridge(to_pci_dev(dev)->bus); >>>> + >>>> + resource_list_for_each_entry(window, &bridge->windows) { >>>> + phys_addr_t start; >>>> + size_t length; >>>> + >>>> + if (resource_type(window->res) != IORESOURCE_MEM && >>>> + resource_type(window->res) != IORESOURCE_IO) >>> >>> As Joerg commented elsewhere, considering anything other than memory >>> resources isn't right (I appreciate you've merely copied my own mistake >>> here). We need some other way to handle root complexes where the CPU >>> MMIO views of PCI windows appear in PCI memory space - using the I/O >>> address of I/O resources only works by chance on Juno, and it still >>> doesn't account for config space. I suggest we just leave that out for >>> the time being to make life easier (does it even apply to anything other >>> than Juno?) and figure it out later. >> OK so I understand I should remove IORESOURCE_IO check. >>> >>>> + continue; >>>> + >>>> + start = window->res->start - window->offset; >>>> + length = window->res->end - window->res->start + 1; >>>> + region = iommu_alloc_resv_region(start, length, >>>> + IOMMU_RESV_NOMAP); >>>> + if (!region) >>>> + return; >>>> + list_add_tail(®ion->list, head); >>>> + } >>>> +} >>> >>> Either way, there's nothing SMMU-specific about PCI windows. The fact >>> that we'd have to copy-paste all of this into the SMMUv3 driver >>> unchanged suggests it should go somewhere common (although I would be >>> inclined to leave the insertion of the fake MSI region to driver-private >>> wrappers). As I said before, the current iova_reserve_pci_windows() >>> simply wants splitting into appropriate public callbacks for >>> get_resv_regions and apply_resv_regions. >> Do you mean somewhere common in the arm-smmu subsystem (new file) or in >> another subsystem (pci?) >> >> More generally the current implementation does not handle the case where >> any of those PCIe host bridge window collide with the MSI window. To me >> this is a flaw. >> 1) Either we take into account the PCIe windows and prevent any >> collision when allocating the MSI window. >> 2) or we do not care about PCIe host bridge windows at kernel level. > > Even more generally, the MSI window also needs to avoid any other > IOMMU-specific reserved regions as well - fortunately I don't think > there's any current intersection between platforms with RMRR-type > reservations and platforms which require MSI mapping - so I think we've > got enough freedom for the moment, but it's certainly an argument in > favour of ultimately expressing PCI windows through the same mechanism > to keep everything in the same place. The other big advantage of > reserved regions is that they will automatically apply to DMA domains as > well. > >> If 1) we are back to the original issue of where do we put the MSI >> window. Obviously at a place which might not be QEMU friendly anymore. >> What allocation policy shall we use? >> >> Second option - sorry I may look stubborn - which I definitively prefer >> and which was also advocated by Alex, we handle PCI host bridge windows >> at user level. MSI window is reported through the iommu group sysfs. >> PCIe host bridge windows can be enumerated through /proc/iomem. Both x86 >> iommu and arm smmu would report an MSI reserved window. ARM MSI window >> would become a de facto reserved window for guests. > > So from the ABI perspective, the sysfs iommu_group/*/reserved_regions > represents a minimum set of regions (MSI, RMRR, etc.) which definitely > *must* be reserved, but offers no guarantee that there aren't also other > regions not represented there. That seems reasonable to start with, and > still leaves us free to expand the scope of reserved regions in future > without breaking anything. > >> Thoughts? > > I like the second option too - "grep PCI /proc/iomem" already catches > more than enumerating the resources does (i.e. ECAM space) - and neither > does it preclude growing the more extensive version on top over time. > > For the sake of moving forward, I'd be happy with just dropping the PCI > stuff from here, and leaving the SMMU drivers exposing the single > hard-coded MSI region directly (to be fair, it'd hardly be the first > function which is identical between the two). OK cool Thanks Eric We can take a look into > making iommu-dma implement PCI windows as nomap resv_regions properly as > an orthogonal thing (for the sake of DMA domains), after which we should > be in a position to drop the hard-coding and start placing the MSI > window dynamically where appropriate. > > Robin. > >>>> +static void arm_smmu_put_resv_regions(struct device *dev, >>>> + struct list_head *head) >>>> +{ >>>> + struct iommu_resv_region *entry, *next; >>>> + >>>> + list_for_each_entry_safe(entry, next, head, list) >>>> + kfree(entry); >>>> +} >>>> + >>>> static struct iommu_ops arm_smmu_ops = { >>>> .capable = arm_smmu_capable, >>>> .domain_alloc = arm_smmu_domain_alloc, >>>> @@ -1560,6 +1610,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) >>>> .domain_get_attr = arm_smmu_domain_get_attr, >>>> .domain_set_attr = arm_smmu_domain_set_attr, >>>> .of_xlate = arm_smmu_of_xlate, >>>> + .get_resv_regions = arm_smmu_get_resv_regions, >>>> + .put_resv_regions = arm_smmu_put_resv_regions, >>>> .pgsize_bitmap = -1UL, /* Restricted during device attach */ >>>> }; >>>> >>>> >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html