From: Florian M?ller <max06.net@xxxxxxxxxxx> > > > Von: Michael Kelley (LINUX) <mikelley@xxxxxxxxxxxxx> > > > > >> > > > > > > > > >> > > > > Issues showed up when I set up a Kali Linux Guest. I > > > > >> > > > > missed the memory configuration before booting up the > > > > >> > > > > instance, so it started with 1GB of memory, and > > > > >> > > > > ballooning active between 512MB and several TB of memory. > > > > >> > > > > Hyper-V started to allocate more and more memory to this > > > > >> > > > > guest since the reported memory requirements also > > > > >> > > > > increased. The guest kernel didn't see any of that allocated > > memory, as far as I can tell. > > > > > > Please do not forget about this: (emoji-pointing-up) > > > > > > > Hmmm. Right off the bat, I don't know how to fix this. Hyper-V tells the > > guest "Here is more memory". The hv_balloon driver adds the memory (but > > doesn't mark it "online"), and sends a positive ACK to Hyper-V. > > From Hyper-V's standpoint, it has successfully given the memory to the > > guest. But if the guest hasn't onlined the memory and isn't using it, the guest > > continues to report high memory pressure. Hyper-V assigns yet more > > memory to the guest, still to no effect. Having the hv_balloon driver delay > > the ACK until the memory comes online is fraught with problems, and of > > course Hyper-V has no visibility into whether the guest has onlined the > > memory. > > > > This may be one where the guest configuration really must be > > correct. But I'm open to other suggestions for a possible solution. > > > > Michael > > From checking the drivers code, it looks like the guest tells only the free and committed > memory, not the total. I can also see considerations about num_pages_onlined in the > committed-calculation. > > I see 2 possible options at the moment: Adding num_total to the message (changing the > protocol), or stop reporting if the guest fails to online memory after the first increase. A > third, more complicated option would be checking for not onlined pages (I've seen > functions for that in the code) and adding them to the free value in the report. > Changing the protocol with the Hyper-V host probably isn't practical. This wouldn't be a high enough priority issue for the Hyper-V team to make the needed changes on their side. The second option also has problems. We really don't want to just stop reporting (and I'm not sure what Hyper-V does if the guest stops reporting), and it's always hard to know how long to wait for user space to do something. The third option sounds feasible, though I haven't looked at the details. > I'd love to write a patch for this if I had a clue how to test and debug it without > rebuilding my kernel all the time. > Feel free to have at it. :-) I can also put the third option on our list of things to look at, but it will be in the "when we can get to it" category. Michael