Re: Core2 and v5.18-rc5 troubles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3.05.2022 18:20, Sean Christopherson wrote:
On Mon, May 02, 2022, Zdenek Kaspar wrote:
On Mon, 2 May 2022 02:29:59 +0200
Zdenek Kaspar <zkaspar82@xxxxxxxxx> wrote:

Hi, when I noticed fix for pre-EPYC and older Intel hardware I checked
v5.18-rc5 on my old Core2 machine and something else fails here.

Core2 means shadow paging, but I don't know how this is related
to pre-EPYC AMD chips (Family 10h and Bulldozers support NPT, too).
Do you mean AMD K8 support here?

When I kill -9 the first qemu attempt (see dmesg-1), then next
attempt is OK, after successful VM shutdown another attempt fails
again (see dmesg-2). After that it takes some time and machine needs
reset (see dmesg-3).

HTH, Z.

Oh crap, I had auto-applied mglru patch, sorry...

Try a raw v5.18-rc5 if your test kernel has some extra patches applied.

Now starting qemu does this:

20 root      20   0   0.0m R 100.0   0.0   0:28.33 kworker/1:0+rcu_gp
looks like:  94.36%  [kernel]                  [k] delay_tsc
kill -9 "qemu process" and noticed several D state processes:
     107 ?        D      0:04 [kworker/u8:7+events_unbound]
     108 ?        D      0:03 [kworker/u8:8+events_unbound]
     632 ?        Ds     0:00 /usr/lib/systemd/systemd-journald

machine is trashed, here's dmesg info:

[  172.349282] BUG: kernel NULL pointer dereference, address: 000000000000000b
[  172.349324] #PF: supervisor write access in kernel mode
[  172.349345] #PF: error_code(0x0002) - not-present page
[  172.349363] PGD 0 P4D 0
[  172.349375] Oops: 0002 [#1] PREEMPT SMP PTI
[  172.349393] CPU: 0 PID: 626 Comm: qemu-build Not tainted 5.18.0-rc5-2-amd64 #1
[  172.349420] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  172.349446] RIP: 0010:kvm_replace_memslot+0xb0/0x2a0 [kvm]
[  172.349496] Code: ac 1c 80 04 00 00 4d 85 f6 74 65 48 89 e8 48 c1 e0 04 49 8b 4c 06 08 48 85 c9 74 21 4c 01 f0 48 8b 10 48 89 11 48 85 d2 74 04 <48> 89 4a 08 48 c7 40 08 00 00 00 00 48 c7 00 00 00 00 00 48 8d 44

...

[  172.349772] Call Trace:
[  172.349785]  <TASK>
[  172.349795]  kvm_set_memslot+0x3a8/0x5e0 [kvm]

My kernel build doesn't match exactly, but it's pretty close, and this call points
to the kvm_activate_memslot() call in the kvm_prepare_memory_region() error path.

	r = kvm_prepare_memory_region(kvm, old, new, change);
	if (r) {
		/*
		 * For DELETE/MOVE, revert the above INVALID change.  No
		 * modifications required since the original slot was preserved
		 * in the inactive slots.  Changing the active memslots also
		 * release slots_arch_lock.
		 */
		if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
			kvm_activate_memslot(kvm, invalid_slot, old);
			kfree(invalid_slot);
		} else {
			mutex_unlock(&kvm->slots_arch_lock);
		}
		return r;
	}

What I can't figure out is how that can result in a NULL pointer dereference.  The
the faulting instruction is

	mov rcx, [rdx + 8]

which appears to be

	WRITE_ONCE(*pprev, next);

in __hlist_del() from hlist_del().

	if (!hlist_unhashed(n)) {
		__hlist_del(n);
		INIT_HLIST_NODE(n);
	}

which comes from

	hash_del(&old->id_node[idx]);

so the fault on 0xb means the pointer is '3'.  "old" is zero allocated, and idx
comes from slots->node_idx, which is effectively a constant '0' or '1'.

I can't see any way for "old" to be '3.  Even more confusing is that that would
imply kvm_prepare_memory_region() failed on a DELETE action, which should be
impossible.

I've tried hitting every edge case of this flow I can think of and haven't been
able to reproduce the behavior, e.g. triggering the new error path introduced by
commit 86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed
host.MAXPHYADDR") is handled cleanly.  Furthermore, AFAIK QEMU doesn't MOVE slots,
so that path seems highly unlikely.

The other relevant datapoint is that fsnotify() hits a very similar NULL pointer
shortly after KVM's explosion, that one directly on address '3'.  That suggests
there's data corruption of some form going on, and KVM just happens to be the first
to be encounter a bad pointer.

Have you tried bisecting?  And/or can you provide your kernel config?  Maybe there's
a debug/sanitizer option that's causing problems.

+1 for bisecting.

Would also ask the reporter to try enabling KASAN,
if it is a genuine memory corruption bug it might show the place it originally
happens (rather than where the overwritten pointer ultimately gets dereferenced).

Thanks,
Maciej



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux