Hi Shameer, On 28/02/18 14:39, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -----Original Message----- >> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] >> Sent: Wednesday, February 28, 2018 11:53 AM >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; >> Alex Williamson <alex.williamson@xxxxxxxxxx> >> Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- >> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry >> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin Murphy >> <robin.murphy@xxxxxxx> >> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a valid >> iova range >> >> Hi Shameer, >> >> On 28/02/18 10:25, Shameerali Kolothum Thodi wrote: >>> >>> >>>> -----Original Message----- >>>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] >>>> Sent: Wednesday, February 28, 2018 9:02 AM >>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; >>>> Alex Williamson <alex.williamson@xxxxxxxxxx> >>>> Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- >>>> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry >>>> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin >> Murphy >>>> <robin.murphy@xxxxxxx> >>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a >> valid >>>> iova range >>>> >>>> Hi Shameer, >>>> >>>> On 27/02/18 10:57, Shameerali Kolothum Thodi wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] >>>>>> Sent: Tuesday, February 27, 2018 8:27 AM >>>>>> To: Alex Williamson <alex.williamson@xxxxxxxxxx> >>>>>> Cc: Shameerali Kolothum Thodi >> <shameerali.kolothum.thodi@xxxxxxxxxx>; >>>>>> pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux- >>>>>> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry >>>>>> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin >>>> Murphy >>>>>> <robin.murphy@xxxxxxx> >>>>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a >>>> valid >>>>>> iova range >>>>>> >>>>>> Hi, >>>>>> On 27/02/18 00:13, Alex Williamson wrote: >>>>>>> On Mon, 26 Feb 2018 23:05:43 +0100 >>>>>>> Auger Eric <eric.auger@xxxxxxxxxx> wrote: >>>>>>> >>>>>>>> Hi Shameer, >>>>>>>> >>>>>>>> [Adding Robin in CC] >>>>>>>> On 21/02/18 13:22, Shameer Kolothum wrote: >>>>>>>>> This checks and rejects any dma map request outside valid iova >>>>>>>>> range. >>>>>>>>> >>>>>>>>> Signed-off-by: Shameer Kolothum >>>>>> <shameerali.kolothum.thodi@xxxxxxxxxx> >>>>>>>>> --- >>>>>>>>> drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++ >>>>>>>>> 1 file changed, 22 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c >>>>>> b/drivers/vfio/vfio_iommu_type1.c >>>>>>>>> index a80884e..3049393 100644 >>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>>>>> @@ -970,6 +970,23 @@ static int vfio_pin_map_dma(struct >> vfio_iommu >>>>>> *iommu, struct vfio_dma *dma, >>>>>>>>> return ret; >>>>>>>>> } >>>>>>>>> >>>>>>>>> +/* >>>>>>>>> + * Check dma map request is within a valid iova range >>>>>>>>> + */ >>>>>>>>> +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu, >>>>>>>>> + dma_addr_t start, dma_addr_t end) >>>>>>>>> +{ >>>>>>>>> + struct list_head *iova = &iommu->iova_list; >>>>>>>>> + struct vfio_iova *node; >>>>>>>>> + >>>>>>>>> + list_for_each_entry(node, iova, list) { >>>>>>>>> + if ((start >= node->start) && (end <= node->end)) >>>>>>>>> + return true; >>>>>>>> I am now confused by the fact this change will prevent existing QEMU >>>>>>>> from working with this series on some platforms. For instance QEMU >> virt >>>>>>>> machine GPA space collides with Seattle PCI host bridge windows. On >>>> ARM >>>>>>>> the smmu and smmuv3 drivers report the PCI host bridge windows as >>>>>>>> reserved regions which does not seem to be the case on other >> platforms. >>>>>>>> The change happened in commit >>>>>> 273df9635385b2156851c7ee49f40658d7bcb29d >>>>>>>> (iommu/dma: Make PCI window reservation generic). >>>>>>>> >>>>>>>> For background, we already discussed the topic after LPC 2016. See >>>>>>>> https://www.spinics.net/lists/kernel/msg2379607.html. >>>>>>>> >>>>>>>> So is it the right choice to expose PCI host bridge windows as reserved >>>>>>>> regions? If yes shouldn't we make a difference between those and MSI >>>>>>>> windows in this series and do not reject any user space DMA_MAP >>>> attempt >>>>>>>> within PCI host bridge windows. >>>>>>> >>>>>>> If the QEMU machine GPA collides with a reserved region today, then >>>>>>> either: >>>>>>> >>>>>>> a) The mapping through the IOMMU works and the reserved region is >>>> wrong >>>>>>> >>>>>>> or >>>>>>> >>>>>>> b) The mapping doesn't actually work, QEMU is at risk of data loss by >>>>>>> being told that it worked, and we're justified in changing that >>>>>>> behavior. >>>>>>> >>>>>>> Without knowing the specifics of SMMU, it doesn't particularly make >>>>>>> sense to me to mark the entire PCI hierarchy MMIO range as reserved, >>>>>>> unless perhaps the IOMMU is incapable of translating those IOVAs. >>>>>> to me the limitation does not come from the smmu itself, which is a >>>>>> separate HW block sitting between the root complex and the >> interconnect. >>>>>> If ACS is not enforced by the PCIe subsystem, the transaction will never >>>>>> reach the IOMMU. >>>>> >>>>> True. And we do have one such platform where ACS is not enforced but >>>>> reserving the regions and possibly creating holes while launching VM will >>>>> make it secure. But I do wonder how we will solve the device grouping >>>>> in such cases. >>>>> >>>>> The Seattle PCI host bridge windows case you mentioned has any pci quirk >>>>> to claim that they support ACS? >>>> No there is none to my knowledge. I am applying Alex' not upstream ACS >>>> overwrite patch. >>> >>> Ok. But isn't that patch actually applicable to cases where ACS is really >> supported >>> by hardware but the capability is not available? >> >> My understanding is normally yes. If you apply the patch whereas the HW >> practically does not support ACS, then you fool the kernel pretending >> there is isolation whereas there is not. I don't know the exact >> capability of the HW on AMD Seattle and effectively I should have cared >> about it much earlier and if the HW capability were supported and not >> properly exposed we should have implemented a clean quirk for this platform. > > Ok. Thanks for the details. > >> >> I am just trying to see whether >>> the argument that we should allow DMA MAP requests for this(non-ACS case) >>> even if the Guest GPA conflict with reserved region holds good. The fact that >> may >>> be it was working before is that the Guest never actually allocated any GPA >> from >>> the reserved region or maybe I am missing something here. >> >> If my understanding is correct, in ideal world we would report the PCI >> host bridge window as reserved only in case ACS is not supported. If you >> apply the patch and overrides the ACS, then the DMA_MAP would be >> allowed. In case the HW does not support ACS, then you could face >> situations where one EP tries to access GPA that never reaches the IOMMU >> (because it corresponds to the BAR of another downstream EP). Same can >> happen at the moment. > > Yes, this is my understanding too. > > Just wondering the below changes to the iommu_dma_get_resv_regions() is > good enough to take care this issue or not. This looks sensible to me. The only question I have is can this ACS computation result be altered later on for some reason. Otherwise I confirm this fixes the reported reserved regions (no more PCI host bridge windows) and assignment failure on Cavium CN88xx. Cavium CN88xx assignment would otherwise be affected too. Thanks Eric > > Thanks, > Shameer > > -- >8 -- > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index f05f3cf..b6e89d5 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -34,6 +34,8 @@ > > #define IOMMU_MAPPING_ERROR 0 > > +#define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) > + > struct iommu_dma_msi_page { > struct list_head list; > dma_addr_t iova; > @@ -183,6 +185,9 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) > if (!dev_is_pci(dev)) > return; > > + if (pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS)) > + return; > + > bridge = pci_find_host_bridge(to_pci_dev(dev)->bus); > resource_list_for_each_entry(window, &bridge->windows) { > struct iommu_resv_region *region; > ---8-- > > >> >> Eric >>> >>> Thanks, >>> Shameer >>> >>>> Thanks >>>> >>>> Eric >>>>> >>>>>> In the case of such overlap, shouldn't we just warn the end-user that >>>>>> this situation is dangerous instead of forbidding the use case which >>>>>> worked "in most cases" until now. >>>>> >>>>> Yes, may be something similar to the allow_unsafe_interrupts case, if >>>>> that is acceptable. >>>>> >>>>> Thanks, >>>>> Shameer >>>>> >>>>>>> Are we trying to prevent untranslated p2p with this reserved range? >>>>>>> That's not necessarily a terrible idea, but it seems that doing it for >>>>>>> that purpose would need to be a lot smarter, taking into account ACS >>>>>>> and precisely selecting ranges within the peer address space that would >>>>>>> be untranslated. Perhaps only populated MMIO within non-ACS >>>>>>> hierarchies. Thanks, >>>>>> >>>>>> Indeed taking into account the ACS capability would refine the >>>>>> situations where a risk exists. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Eric >>>>>>> >>>>>>> Alex >>>>>>>