[Bug 197951] QEMU/KVM & VFIO & PCI passthru with Windows 10 x64 guest: memory access intermittently causes CRITICAL_STRUCTURE_CORRUPTION BSOD unless swap is disabled on host, since 4.12.13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=197951

--- Comment #12 from Ladi Prosek (lprosek@xxxxxxxxxx) ---
(In reply to Jimi from comment #11)
> I'm about to spend a few days with it installed to make sure, but it looks
> like this commit is probably our culprit:
> 
> $ git bisect good
> Bisecting: 0 revisions left to test after this (roughly 1 step)
> [9f7df0bca168528aba20794f400be134495551b8] xfs: XFS_IS_REALTIME_INODE()
> should be false if no rt device present

A few things hint at this being a red herring.

* It's the first commit before the 4.12.13 tag which means that you marked
4.12.13 as bad and everything else as good.

* There's nothing in it that would explain why it affects only virt and only
Windows guests.

> It looks like there's some evidence that this issue doesn't *only* come from
> 4.12.13. I want to reiterate, I was on 4.12.13 when this problem started
> happening to me, and I haven't had a single BSOD since downgrading to
> 4.12.12, including during this entire bisect. It was happening frequently
> enough that if 4.12.13 wasn't at least one of the cuprits, I definitely
> would've had a few BSODs by now.

The bug is likely timing sensitive and just rebuilding the kernel, out of the
same sources, may end up more (or less) prone to it just by how the binary is
laid out, the exact compiler used etc.

Also, we should not rule out the possibility that the problem has existed for a
long time and Windows 10 got the ability to detect certain corruptions recently
via a Windows Update patch.

I hit it again yesterday and the BSOD analyzes to:

CRITICAL_STRUCTURE_CORRUPTION (109)
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
 or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
 debugger that was not attached when the system was booted. Normal breakpoints,
 "bp", can only be set if the debugger is attached at boot time. Hardware
 breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or
data.
Arguments:
Arg1: a3a0206143b9d5b3, Reserved
Arg2: b3b72ce7963bad06, Reserved
Arg3: 0000032000000000, Failure type dependent information
Arg4: 0000000000000017, Type of corrupted region, can be
[...]
        16  : Critical floating point control register modification
        17  : Local APIC modification
        18  : Kernel notification callout modification
[...]


I'm pretty sure that last time I got it the type of corrupted region was 17 as
well.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux