From: Thomas Deutschmann <whissi@xxxxxxxxxx> Sent: Wednesday, October 20, 2021 8:10 AM > To: linux-hyperv@xxxxxxxxxxxxxxx > Subject: hv_balloon: kmsg about unhandled message is killing the system > > Hi, > > I am running a Hyper-V Gen2 VM with Gentoo Linux where I make use of the > memory ballooning feature (8192MB RAM Minimum; 61440MB RAM Maximum; 20% > memory buffer) for almost 2 years. Since kernel 5.14, the virtual > machine will sometimes log _a lot_ of > > > kernel: [ 1022.277623] hv_balloon: Unhandled message: type: 0 > > kernel: [ 1022.277624] hv_balloon: Unhandled message: type: 32768 > > kernel: [ 1022.277625] hv_balloon: Unhandled message: type: 51200 > > kernel: [ 1022.277625] hv_balloon: Unhandled message: type: 59392 > > kernel: [ 1022.277689] hv_balloon: Ballooned pages: 1519104 > > messages, causing log mountpoint (in in my case root mountpoint) to run > out of disk space which will kill the system in the end. > > I have never seen this before with any <5.14 kernel. > > Of course, I tried to bisect the kernel multiple times, but I never was > successful because it is not easy to trigger the problem. What seems to > work best: > > 1) After start, wait ~60 seconds for > > > hv_balloon: Max. dynamic memory size: 61440 MB > > message. > > 2) Now allocate some memory causing the VM to request more memory from > the host system: > > $ </dev/zero head -c 22G | pv -L 256M | tail > > (Note: You have to do that slowly because host will only grant > more memory when memory pressure is constantly high > but when you are requesting memory too fast you will > run out of memory) > > 3) Now end the process (CTRL+C) and wait until the VM has returned > memory back to host system. > > 4) Now I start to compile chromium and firefox with 20 threads each in > parallel. > > If the kernel is faulty, in most cases I'll see the kmsgs about > unhandled message types within 10 minutes. If I'll get the message > > > hv_balloon: Balloon request will be partially fulfilled. Balloon floor reached > > it's usually a sign for working kernel. > > But as said at the beginning, this is not 100% reliable. I already ended > up with a kernel where I thought "This revision is fine" and suddenly > the system died because millions of those messages were outputted. Or > sometimes I am unable to trigger the failure again for a bad revision. > See my last bisect attempt: > My apologies that someone did not get back to you sooner on this issue. Someone has recently found a bug that is the likely cause. See https://lore.kernel.org/linux-hyperv/20211213014709.GA2316@anparri/T/#t if you haven't already. I think the proposed fix works, but there may be some additional discussion about whether it is the best fix. Michael Kelley