RE: hv_balloon: kmsg about unhandled message is killing the system

"Michael Kelley (LINUX)" <mikelley@xxxxxxxxxxxxx> · Mon, 13 Dec 2021 05:54:19 +0000

From: Thomas Deutschmann <whissi@xxxxxxxxxx> Sent: Wednesday, October 20, 2021 8:10 AM
> To: linux-hyperv@xxxxxxxxxxxxxxx
> Subject: hv_balloon: kmsg about unhandled message is killing the system
> 
> Hi,
> 
> I am running a Hyper-V Gen2 VM with Gentoo Linux where I make use of the
> memory ballooning feature (8192MB RAM Minimum; 61440MB RAM Maximum; 20%
> memory buffer) for almost 2 years. Since kernel 5.14, the virtual
> machine will sometimes log _a lot_ of
> 
> > kernel: [ 1022.277623] hv_balloon: Unhandled message: type: 0
> > kernel: [ 1022.277624] hv_balloon: Unhandled message: type: 32768
> > kernel: [ 1022.277625] hv_balloon: Unhandled message: type: 51200
> > kernel: [ 1022.277625] hv_balloon: Unhandled message: type: 59392
> > kernel: [ 1022.277689] hv_balloon: Ballooned pages: 1519104
> 
> messages, causing log mountpoint (in in my case root mountpoint) to run
> out of disk space which will kill the system in the end.
> 
> I have never seen this before with any <5.14 kernel.
> 
> Of course, I tried to bisect the kernel multiple times, but I never was
> successful because it is not easy to trigger the problem. What seems to
> work best:
> 
> 1) After start, wait ~60 seconds for
> 
> > hv_balloon: Max. dynamic memory size: 61440 MB
> 
> message.
> 
> 2) Now allocate some memory causing the VM to request more memory from
> the host system:
> 
>    $ </dev/zero head -c 22G | pv -L 256M | tail
> 
>    (Note: You have to do that slowly because host will only grant
>           more memory when memory pressure is constantly high
>           but when you are requesting memory too fast you will
>           run out of memory)
> 
> 3) Now end the process (CTRL+C) and wait until the VM has returned
> memory back to host system.
> 
> 4) Now I start to compile chromium and firefox with 20 threads each in
> parallel.
> 
> If the kernel is faulty, in most cases I'll see the kmsgs about
> unhandled message types within 10 minutes. If I'll get the message
> 
> > hv_balloon: Balloon request will be partially fulfilled. Balloon floor reached
> 
> it's usually a sign for working kernel.
> 
> But as said at the beginning, this is not 100% reliable. I already ended
> up with a kernel where I thought "This revision is fine" and suddenly
> the system died because millions of those messages were outputted. Or
> sometimes I am unable to trigger the failure again for a bad revision.
> See my last bisect attempt:
> 

My apologies that someone did not get back to you sooner on this issue.
Someone has recently found a bug that is the likely cause.  See
https://lore.kernel.org/linux-hyperv/20211213014709.GA2316@anparri/T/#t
if you haven't already.  I think the proposed fix works, but there may
be some additional discussion about whether it is the best fix.

Michael Kelley