Hello Jörg, On 10/09/2015 at 04:45 PM, Joerg Roedel wrote: > Hi Andreas, > > On Thu, Oct 08, 2015 at 09:47:28PM +0200, Andreas Hartmann wrote: >> This time, the oops was caused by the second PCI card I'm passing >> through to another VM (the ath9k card worked fine this time - chance?). >> I added the lspci output to the attached file, too. > > I digged a little bit around here and found a 32bit PCI card and plugged > it into the AMD IOMMU box. I could reproduce the problem and here is > patch which fixes it for me. Can you test it too please? Works fine here. Thanks. But now, I can see the next big problem of v4.3-rc4: My VMs are connected on the host via tun/tap devices. They themselves are connected to a bridge device. If there is data sent between the VMs, the (system) load is more as double (!) of the load seen with 4.1.x e.g. . Having an allover throughput of about low 35 MBit/s seen by the host over all tun/tap devices / bridge creates a load of 3 (!!) with 3 VMs being involved. That's about half of the load produced by kernel compiling (make -j8). > I'd like to > send a pull-req with this fix included to Linus for rc5. > > Thanks, > > Joerg > > From d07307c04edffaaa045fb83713f8808e55ffa895 Mon Sep 17 00:00:00 2001 > From: Joerg Roedel <jroedel@xxxxxxx> > Date: Fri, 9 Oct 2015 16:23:33 +0200 > Subject: [PATCH] iommu/amd: Fix NULL pointer deref on device detach > > When a device group is detached from its domain, the iommu > core code calls into the iommu driver to detach each device > individually. > > Before this functionality went into the iommu core code, it > was implemented in the drivers, also in the AMD IOMMU > driver as the device alias handling code. > > This code is still present, as there might be aliases that > don't exist as real PCI devices (and are therefore invisible > to the iommu core code). > > Unfortunatly it might happen now, that a device is unbound > multiple times from its domain, first by the alias handling > code and then by the iommu core code (or vice verca). > > This ends up in the do_detach function which dereferences > the dev_data->domain pointer. When the device is already > detached, this pointer is NULL and we get a kernel oops. > > Removing the alias code completly is not an option, as that > would also remove the code which handles invisible aliases. > The code could be simplified, but this is too big of a > change outside the merge window. > > For now, just check the dev_data->domain pointer in > do_detach and bail out if it is NULL. > > Andreas Hartmann <andihartmann@xxxxxxxxxx> > Signed-off-by: Joerg Roedel <jroedel@xxxxxxx> > --- > drivers/iommu/amd_iommu.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index f82060e7..08d2775 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2006,6 +2006,15 @@ static void do_detach(struct iommu_dev_data *dev_data) > { > struct amd_iommu *iommu; > > + /* > + * First check if the device is still attached. It might already > + * be detached from its domain because the generic > + * iommu_detach_group code detached it and we try again here in > + * our alias handling. > + */ > + if (!dev_data->domain) > + return; > + > iommu = amd_iommu_rlookup_table[dev_data->devid]; > > /* decrease reference counters */ > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel