Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] > Sent: Wednesday, February 28, 2018 11:53 AM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; > Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry > <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin Murphy > <robin.murphy@xxxxxxx> > Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a valid > iova range > > Hi Shameer, > > On 28/02/18 10:25, Shameerali Kolothum Thodi wrote: > > > > > >> -----Original Message----- > >> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] > >> Sent: Wednesday, February 28, 2018 9:02 AM > >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; > >> Alex Williamson <alex.williamson@xxxxxxxxxx> > >> Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- > >> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry > >> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin > Murphy > >> <robin.murphy@xxxxxxx> > >> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a > valid > >> iova range > >> > >> Hi Shameer, > >> > >> On 27/02/18 10:57, Shameerali Kolothum Thodi wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] > >>>> Sent: Tuesday, February 27, 2018 8:27 AM > >>>> To: Alex Williamson <alex.williamson@xxxxxxxxxx> > >>>> Cc: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@xxxxxxxxxx>; > >>>> pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- > >>>> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry > >>>> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin > >> Murphy > >>>> <robin.murphy@xxxxxxx> > >>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a > >> valid > >>>> iova range > >>>> > >>>> Hi, > >>>> On 27/02/18 00:13, Alex Williamson wrote: > >>>>> On Mon, 26 Feb 2018 23:05:43 +0100 > >>>>> Auger Eric <eric.auger@xxxxxxxxxx> wrote: > >>>>> > >>>>>> Hi Shameer, > >>>>>> > >>>>>> [Adding Robin in CC] > >>>>>> On 21/02/18 13:22, Shameer Kolothum wrote: > >>>>>>> This checks and rejects any dma map request outside valid iova > >>>>>>> range. > >>>>>>> > >>>>>>> Signed-off-by: Shameer Kolothum > >>>> <shameerali.kolothum.thodi@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++ > >>>>>>> 1 file changed, 22 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c > >>>> b/drivers/vfio/vfio_iommu_type1.c > >>>>>>> index a80884e..3049393 100644 > >>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c > >>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>>>>>> @@ -970,6 +970,23 @@ static int vfio_pin_map_dma(struct > vfio_iommu > >>>> *iommu, struct vfio_dma *dma, > >>>>>>> return ret; > >>>>>>> } > >>>>>>> > >>>>>>> +/* > >>>>>>> + * Check dma map request is within a valid iova range > >>>>>>> + */ > >>>>>>> +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu, > >>>>>>> + dma_addr_t start, dma_addr_t end) > >>>>>>> +{ > >>>>>>> + struct list_head *iova = &iommu->iova_list; > >>>>>>> + struct vfio_iova *node; > >>>>>>> + > >>>>>>> + list_for_each_entry(node, iova, list) { > >>>>>>> + if ((start >= node->start) && (end <= node->end)) > >>>>>>> + return true; > >>>>>> I am now confused by the fact this change will prevent existing QEMU > >>>>>> from working with this series on some platforms. For instance QEMU > virt > >>>>>> machine GPA space collides with Seattle PCI host bridge windows. On > >> ARM > >>>>>> the smmu and smmuv3 drivers report the PCI host bridge windows as > >>>>>> reserved regions which does not seem to be the case on other > platforms. > >>>>>> The change happened in commit > >>>> 273df9635385b2156851c7ee49f40658d7bcb29d > >>>>>> (iommu/dma: Make PCI window reservation generic). > >>>>>> > >>>>>> For background, we already discussed the topic after LPC 2016. See > >>>>>> https://www.spinics.net/lists/kernel/msg2379607.html. > >>>>>> > >>>>>> So is it the right choice to expose PCI host bridge windows as reserved > >>>>>> regions? If yes shouldn't we make a difference between those and MSI > >>>>>> windows in this series and do not reject any user space DMA_MAP > >> attempt > >>>>>> within PCI host bridge windows. > >>>>> > >>>>> If the QEMU machine GPA collides with a reserved region today, then > >>>>> either: > >>>>> > >>>>> a) The mapping through the IOMMU works and the reserved region is > >> wrong > >>>>> > >>>>> or > >>>>> > >>>>> b) The mapping doesn't actually work, QEMU is at risk of data loss by > >>>>> being told that it worked, and we're justified in changing that > >>>>> behavior. > >>>>> > >>>>> Without knowing the specifics of SMMU, it doesn't particularly make > >>>>> sense to me to mark the entire PCI hierarchy MMIO range as reserved, > >>>>> unless perhaps the IOMMU is incapable of translating those IOVAs. > >>>> to me the limitation does not come from the smmu itself, which is a > >>>> separate HW block sitting between the root complex and the > interconnect. > >>>> If ACS is not enforced by the PCIe subsystem, the transaction will never > >>>> reach the IOMMU. > >>> > >>> True. And we do have one such platform where ACS is not enforced but > >>> reserving the regions and possibly creating holes while launching VM will > >>> make it secure. But I do wonder how we will solve the device grouping > >>> in such cases. > >>> > >>> The Seattle PCI host bridge windows case you mentioned has any pci quirk > >>> to claim that they support ACS? > >> No there is none to my knowledge. I am applying Alex' not upstream ACS > >> overwrite patch. > > > > Ok. But isn't that patch actually applicable to cases where ACS is really > supported > > by hardware but the capability is not available? > > My understanding is normally yes. If you apply the patch whereas the HW > practically does not support ACS, then you fool the kernel pretending > there is isolation whereas there is not. I don't know the exact > capability of the HW on AMD Seattle and effectively I should have cared > about it much earlier and if the HW capability were supported and not > properly exposed we should have implemented a clean quirk for this platform. Ok. Thanks for the details. > > I am just trying to see whether > > the argument that we should allow DMA MAP requests for this(non-ACS case) > > even if the Guest GPA conflict with reserved region holds good. The fact that > may > > be it was working before is that the Guest never actually allocated any GPA > from > > the reserved region or maybe I am missing something here. > > If my understanding is correct, in ideal world we would report the PCI > host bridge window as reserved only in case ACS is not supported. If you > apply the patch and overrides the ACS, then the DMA_MAP would be > allowed. In case the HW does not support ACS, then you could face > situations where one EP tries to access GPA that never reaches the IOMMU > (because it corresponds to the BAR of another downstream EP). Same can > happen at the moment. Yes, this is my understanding too. Just wondering the below changes to the iommu_dma_get_resv_regions() is good enough to take care this issue or not. Thanks, Shameer -- >8 -- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f05f3cf..b6e89d5 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -34,6 +34,8 @@ #define IOMMU_MAPPING_ERROR 0 +#define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) + struct iommu_dma_msi_page { struct list_head list; dma_addr_t iova; @@ -183,6 +185,9 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) if (!dev_is_pci(dev)) return; + if (pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS)) + return; + bridge = pci_find_host_bridge(to_pci_dev(dev)->bus); resource_list_for_each_entry(window, &bridge->windows) { struct iommu_resv_region *region; ---8-- > > Eric > > > > Thanks, > > Shameer > > > >> Thanks > >> > >> Eric > >>> > >>>> In the case of such overlap, shouldn't we just warn the end-user that > >>>> this situation is dangerous instead of forbidding the use case which > >>>> worked "in most cases" until now. > >>> > >>> Yes, may be something similar to the allow_unsafe_interrupts case, if > >>> that is acceptable. > >>> > >>> Thanks, > >>> Shameer > >>> > >>>>> Are we trying to prevent untranslated p2p with this reserved range? > >>>>> That's not necessarily a terrible idea, but it seems that doing it for > >>>>> that purpose would need to be a lot smarter, taking into account ACS > >>>>> and precisely selecting ranges within the peer address space that would > >>>>> be untranslated. Perhaps only populated MMIO within non-ACS > >>>>> hierarchies. Thanks, > >>>> > >>>> Indeed taking into account the ACS capability would refine the > >>>> situations where a risk exists. > >>>> > >>>> Thanks > >>>> > >>>> Eric > >>>>> > >>>>> Alex > >>>>>