On 5/26/2023 5:02 PM, Sean Christopherson wrote:
On Fri, May 26, 2023, Brian Rak wrote:
On 5/24/2023 9:39 AM, Brian Rak wrote:
On 5/23/2023 12:22 PM, Sean Christopherson wrote:
The other thing that would be helpful would be getting kernel stack
traces of the
relevant tasks/threads.� The vCPU stack traces won't be interesting,
but it'll
likely help to see what the fallocate() tasks are doing.
I'll see what I can come up with here, I was running into some
difficulty getting useful stack traces out of the VM
I didn't have any luck gathering guest-level stack traces - kaslr makes it
pretty difficult even if I have the guest kernel symbols.
Sorry, I was hoping to get host stack traces, not guest stack traces. I am hoping
to see what the fallocate() in the *host* is doing.
Ah - here's a different instance of it with a full backtrace from the host:
(gdb) thread apply all bt
Thread 8 (Thread 0x7fbbd0bff700 (LWP 353251) "vnc_worker"):
#0 futex_wait_cancelable (private=0, expected=0,
futex_word=0x7fbddd4f6b2c) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x7fbddd4f6b38, cond=0x7fbddd4f6b00) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x7fbddd4f6b00,
mutex=mutex@entry=0x7fbddd4f6b38) at pthread_cond_wait.c:638
#3 0x00005593ea9a232b in qemu_cond_wait_impl (cond=0x7fbddd4f6b00,
mutex=0x7fbddd4f6b38, file=0x5593eaa152b4 "../../ui/vnc-jobs.c",
line=248) at ../../util/qemu-thread-posix.c:220
#4 0x00005593ea4bfc33 in vnc_worker_thread_loop (queue=0x7fbddd4f6b00)
at ../../ui/vnc-jobs.c:248
#5 0x00005593ea4c08f8 in vnc_worker_thread
(arg=arg@entry=0x7fbddd4f6b00) at ../../ui/vnc-jobs.c:361
#6 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd0bfcf30) at
../../util/qemu-thread-posix.c:505
#7 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#8 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 7 (Thread 0x7fbbd85ff700 (LWP 353248) "CPU 3/KVM"):
#0 0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd42f7c0,
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2 0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd42f7c0) at
../../accel/kvm/kvm-all.c:2850
#3 0x00005593ea8457ed in kvm_vcpu_thread_fn
(arg=arg@entry=0x7fbddd42f7c0) at ../../accel/kvm/kvm-accel-ops.c:51
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd85fcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 6 (Thread 0x7fbbd97ff700 (LWP 353247) "CPU 2/KVM"):
#0 0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd4213c0,
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2 0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd4213c0) at
../../accel/kvm/kvm-all.c:2850
#3 0x00005593ea8457ed in kvm_vcpu_thread_fn
(arg=arg@entry=0x7fbddd4213c0) at ../../accel/kvm/kvm-accel-ops.c:51
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd97fcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 5 (Thread 0x7fbbda9ff700 (LWP 353246) "CPU 1/KVM"):
#0 0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd6f5e40,
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2 0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd6f5e40) at
../../accel/kvm/kvm-all.c:2850
#3 0x00005593ea8457ed in kvm_vcpu_thread_fn
(arg=arg@entry=0x7fbddd6f5e40) at ../../accel/kvm/kvm-accel-ops.c:51
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbda9fcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 4 (Thread 0x7fbbdbbff700 (LWP 353245) "CPU 0/KVM"):
#0 0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd6a8c00,
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2 0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd6a8c00) at
../../accel/kvm/kvm-all.c:2850
#3 0x00005593ea8457ed in kvm_vcpu_thread_fn
(arg=arg@entry=0x7fbddd6a8c00) at ../../accel/kvm/kvm-accel-ops.c:51
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbdbbfcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 3 (Thread 0x7fbbdcfff700 (LWP 353244) "IO mon_iothread"):
#0 0x00007fbddf11b96f in __GI___poll (fds=0x7fbbdc209000, nfds=3,
timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007fbddf6eb0ae in ?? () from
target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007fbddf6eb40b in g_main_loop_run () from
target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00005593ea880509 in iothread_run
(opaque=opaque@entry=0x7fbddd51db00) at ../../iothread.c:74
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbdcffcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 2 (Thread 0x7fbdddfff700 (LWP 353235) "qemu-kvm"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00005593ea9a29aa in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at ./include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x5593eb1a21a8 <rcu_call_ready_event>)
at ../../util/qemu-thread-posix.c:430
#3 0x00005593ea9abd80 in call_rcu_thread (opaque=opaque@entry=0x0) at
../../util/rcu.c:261
#4 0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbdddffcf30) at
../../util/qemu-thread-posix.c:505
#5 0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#6 0x00007fbddf127a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7fbdde874680 (LWP 353228) "qemu-kvm"):
#0 0x00007fbddf11ba66 in __ppoll (fds=0x7fbbd39b9f00, nfds=21,
timeout=<optimized out>, timeout@entry=0x7fff9ff32c90,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x00005593ea9b70f1 in ppoll (__ss=0x0, __timeout=0x7fff9ff32c90,
__nfds=<optimized out>, __fds=<optimized out>) at
/usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=2429393072) at ../../util/qemu-timer.c:351
#3 0x00005593ea9b4955 in os_host_main_loop_wait (timeout=2429393072) at
../../util/main-loop.c:315
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at
../../util/main-loop.c:606
#5 0x00005593ea64ca91 in qemu_main_loop () at ../../softmmu/runstate.c:739
#6 0x00005593ea84c8e7 in qemu_default_main () at ../../softmmu/main.c:37
#7 0x00007fbddf04fd0a in __libc_start_main (main=0x5593ea497620 <main>,
argc=109, argv=0x7fff9ff32e58, init=<optimized out>, fini=<optimized
out>, rtld_fini=<optimized out>, stack_end=0x7fff9ff32e48) at
../csu/libc-start.c:308
#8 0x00005593ea498dda in _start ()
and trace with -e all from that same guest:
CPU-353246 [041] 5473778.770320: write_msr:
1d9, value 4000
CPU-353248 [027] 5473778.770321: write_msr:
1d9, value 4000
CPU-353246 [041] 5473778.770321: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
CPU-353248 [027] 5473778.770321: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353245 [039] 5473778.770321: kvm_entry:
vcpu 0, rip 0xffffffffbc7648b0
CPU-353246 [041] 5473778.770321: kvm_page_fault:
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
CPU-353247 [019] 5473778.770321: write_msr:
1d9, value 4000
CPU-353248 [027] 5473778.770321: kvm_page_fault:
vcpu 3 rip 0xffffffffbc7648b0 address 0x0000000274300ff8 error_code 0x683
CPU-353245 [039] 5473778.770321: rcu_utilization:
Start context switch
CPU-353247 [019] 5473778.770321: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353245 [039] 5473778.770322: rcu_utilization: End
context switch
CPU-353246 [041] 5473778.770322: kvm_entry:
vcpu 1, rip 0xffffffffbc726140
CPU-353247 [019] 5473778.770322: kvm_page_fault:
vcpu 2 rip 0xffffffffbc7648b0 address 0x000000026e580ff8 error_code 0x683
CPU-353248 [027] 5473778.770322: kvm_entry:
vcpu 3, rip 0xffffffffbc7648b0
CPU-353246 [041] 5473778.770322: rcu_utilization:
Start context switch
CPU-353246 [041] 5473778.770322: rcu_utilization: End
context switch
CPU-353248 [027] 5473778.770322: rcu_utilization:
Start context switch
CPU-353248 [027] 5473778.770322: rcu_utilization: End
context switch
CPU-353247 [019] 5473778.770323: kvm_entry:
vcpu 2, rip 0xffffffffbc7648b0
CPU-353245 [039] 5473778.770323: write_msr:
1d9, value 4000
CPU-353245 [039] 5473778.770323: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353247 [019] 5473778.770323: rcu_utilization:
Start context switch
CPU-353246 [041] 5473778.770323: write_msr:
1d9, value 4000
CPU-353245 [039] 5473778.770323: kvm_page_fault:
vcpu 0 rip 0xffffffffbc7648b0 address 0x0000000263670ff8 error_code 0x683
CPU-353246 [041] 5473778.770323: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
CPU-353247 [019] 5473778.770323: rcu_utilization: End
context switch
CPU-353248 [027] 5473778.770324: write_msr:
1d9, value 4000
CPU-353246 [041] 5473778.770324: kvm_page_fault:
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
CPU-353248 [027] 5473778.770324: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353245 [039] 5473778.770324: kvm_entry:
vcpu 0, rip 0xffffffffbc7648b0
CPU-353248 [027] 5473778.770324: kvm_page_fault:
vcpu 3 rip 0xffffffffbc7648b0 address 0x0000000274300ff8 error_code 0x683
CPU-353246 [041] 5473778.770324: kvm_entry:
vcpu 1, rip 0xffffffffbc726140
CPU-353245 [039] 5473778.770324: rcu_utilization:
Start context switch
CPU-353246 [041] 5473778.770325: rcu_utilization:
Start context switch
CPU-353247 [019] 5473778.770325: write_msr:
1d9, value 4000
CPU-353245 [039] 5473778.770325: rcu_utilization: End
context switch
CPU-353246 [041] 5473778.770325: rcu_utilization: End
context switch
CPU-353247 [019] 5473778.770325: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353248 [027] 5473778.770325: kvm_entry:
vcpu 3, rip 0xffffffffbc7648b0
CPU-353247 [019] 5473778.770325: kvm_page_fault:
vcpu 2 rip 0xffffffffbc7648b0 address 0x000000026e580ff8 error_code 0x683
CPU-353248 [027] 5473778.770325: rcu_utilization:
Start context switch
CPU-353248 [027] 5473778.770326: rcu_utilization: End
context switch
CPU-353245 [039] 5473778.770326: write_msr:
1d9, value 4000
CPU-353246 [041] 5473778.770326: write_msr:
1d9, value 4000
CPU-353245 [039] 5473778.770326: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353247 [019] 5473778.770326: kvm_entry:
vcpu 2, rip 0xffffffffbc7648b0
CPU-353246 [041] 5473778.770326: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
CPU-353245 [039] 5473778.770326: kvm_page_fault:
vcpu 0 rip 0xffffffffbc7648b0 address 0x0000000263670ff8 error_code 0x683
CPU-353246 [041] 5473778.770326: kvm_page_fault:
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
CPU-353247 [019] 5473778.770326: rcu_utilization:
Start context switch
CPU-353247 [019] 5473778.770327: rcu_utilization: End
context switch
CPU-353248 [027] 5473778.770327: write_msr:
1d9, value 4000
CPU-353246 [041] 5473778.770327: kvm_entry:
vcpu 1, rip 0xffffffffbc726140
CPU-353248 [027] 5473778.770327: kvm_exit:
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
CPU-353245 [039] 5473778.770327: kvm_entry:
vcpu 0, rip 0xffffffffbc7648b0
CPU-353246 [041] 5473778.770327: rcu_utilization:
Start context switch
CPU-353246 [041] 5473778.770328: rcu_utilization: End
context switch
Another datapoint that might provide insight would be seeing if/how KVM's page
faults stats change, e.g. look at /sys/kernel/debug/kvm/pf_* multiple times when
the guest is stuck.
It looks like pf_taken is the only real one incrementing:
# cd /sys/kernel/debug/kvm/353228-12
# tail -n1 pf*; sleep 5; ; echo "======"; tail -n1 pf*
==> pf_emulate <==
12353
==> pf_fast <==
56
==> pf_fixed <==
27039604264
==> pf_guest <==
0
==> pf_mmio_spte_created <==
12353
==> pf_spurious <==
2348694
==> pf_taken <==
74486522452
======
==> pf_emulate <==
12353
==> pf_fast <==
56
==> pf_fixed <==
27039604264
==> pf_guest <==
0
==> pf_mmio_spte_created <==
12353
==> pf_spurious <==
2348694
==> pf_taken <==
74499574212
# tail -n1 *tlb*; sleep 5; echo "======"; tail -n1 *tlb*
==> remote_tlb_flush <==
1731549762
==> remote_tlb_flush_requests <==
3092024754
==> tlb_flush <==
8203297646
======
==> remote_tlb_flush <==
1731549762
==> remote_tlb_flush_requests <==
3092024754
==> tlb_flush <==
8203297649
Are you able to run modified host kernels? If so, the easiest next step, assuming
stack traces don't provide a smoking gun, would be to add printks into the page
fault path to see why KVM is retrying instead of installing a SPTE.
We can, but it can take quite some time from when we do the update to
actually seeing results. This problem is inconsistent at best, and even
though we're seeing it a ton of times a day, it's can show up anywhere.
Even if we rolled it out today, we'd still be looking at weeks/months
before we had any significant number of machines on it.