On 15/08/19 17:23, Alex Williamson wrote: > 0xffe00 > 0xfee00 > 0xfec00 > 0xc1000 > 0x80a000 > 0x800000 > 0x100000 > > ie. I can effective only say that sp->gfn values of 0x0, 0x40000, and > 0x80000 can take the continue branch without seeing bad behavior in the > VM. > > The assigned GPU has BARs at GPAs: > > 0xc0000000-0xc0ffffff > 0x800000000-0x808000000 > 0x808000000-0x809ffffff > > And the assigned companion audio function is at GPA: > > 0xc1080000-0xc1083fff > > Only one of those seems to align very well with a gfn base involved > here. The virtio ethernet has an mmio range at GPA 0x80a000000, > otherwise I don't find any other I/O devices coincident with the gfns > above. The IOAPIC and LAPIC are respectively gfn 0xfec00 and 0xfee00. The audio function BAR is only 16 KiB, so the 2 MiB PDE starting at 0xc1000 includes both userspace-MMIO and device-MMIO memory. The virtio-net BAR is also userspace-MMIO. It seems like the problem occurs when the sp->gfn you "continue over" includes a userspace-MMIO gfn. But since I have no better ideas right now, I'm going to apply the revert (we don't know for sure that it only happens with assigned devices). Paolo > I'm running the VM with 2MB hugepages, but I believe the issue still > occurs with standard pages. When run with standard pages I see more > hits to gfn values 0, 0x40000, 0x80000, but the same number of hits to > the set above that cannot take the continue branch. I don't know if > that means anything. > > Any further ideas what to look for? Thanks, > > Alex > > PS - I see the posted workaround patch, I'll test that in the interim. >