Re: Stalls in qemu with host running 6.1 (everything stuck at mmap_read_lock())

Jiri Slaby <jirislaby@xxxxxxxxxx> · Thu, 12 Jan 2023 07:07:26 +0100

Hi,

On 12. 01. 23, 1:37, Pedro Falcato wrote:
I just want to chime in and say that I've also hit this regression
right as I (Arch) updated to 6.1 a few weeks ago.
This completely ruined my qemu workflow such that I had to fallback to
using an LTS kernel.

Some data I've gathered:
1) It seems to not happen right after booting - I'm unsure if this is
due to memory pressure or less CPU load or any other factor

+1 as I wrote.

2) It seems to intensify after swapping a fair amount? At least this
has been my experience.

I have no swap.

3) The largest slowdown seems to be when qemu is booting the guest,
possibly during heavy memory allocation - problems range from "takes
tens of seconds to boot" to "qemu is completely blocked and needs a
SIGKILL spam".

+1

4) While traditional process monitoring tools break (likely due to
mmap_lock getting hogged), I can (empirically, using /bin/free) tell
that the system seems to be swapping in/out quite a fair bit

Yes, htop/top/ps and such are stuck at the read of /proc/<pid>/cmdline 
as I wrote (waiting for the mmap lock).

My 4) is particularly confusing to me as I had originally blamed the
problem on the MGLRU changes, while you don't seem to be swapping at
all.
Could this be related to the maple tree patches? Should we CC both the
MGLRU folks and the maple folks?

I have little insight into what the kernel's state actually is apart
from this - perf seems to break, and I have no kernel debugger as this
is my live personal machine :/
I would love it if someone hinted to possible things I/we could try in
order to track this down. Is this not git-bisectable at all?

I have rebooted to a fresh kernel which 1) have lockdep enabled, and 2) 
I have debuginfo for. So next time this happens, I can print held locks 
and dump a kcore (kdump is set up).

regards,
--
js
suse labs