[...] > I guess the question is if pressuring the guest to compact the memory > to create more THP pages would add value versus letting the pressure > from the inflation cause more potential fragmentation. Would be interesting to see some actual numbers. Right now, it's just speculations. I know that there are ideas to do proactive compaction, maybe that has a similar effect. [...] > >>> >>>> There was some work on huge page ballooning in a paper I read. But once >>>> the guest is out of huge pages to report, it would want to fallback to >>>> smaller granularity (down to 4k, to create real memory pressure), where >>>> you would end up in the very same situation you are right now. So it's - >>>> IMHO - only of limited used. >>> >>> I wouldn't think it would be that limited of a use case. By having the >>> balloon inflate with higher order pages you should be able to put more >>> pressure on the guest to compact the memory and reduce fragmentation >>> instead of increasing it. If you have the balloon flushing out the >>> lower order pages it is sitting on when there is pressure it seems >>> like it would be more likely to reduce fragmentation further. >> >> As we have balloon compaction in place and balloon pages are movable, I >> guess fragmentation is not really an issue. > > I'm not sure that is truly the case. My concern is that by allocating > the 4K pages we are breaking up the higher order pages and we aren't > necessarily guaranteed to obtain all pieces of the higher order page > when we break it up. As a result we could end up causing the THP pages > to be broken up and scattered between the balloon and other consumers We are allocating movable memory. We should be working on/creating movable pageblocks. Yes, other concurrent allcoations can race with the allocation. But after all, they are likely movable as well (because they are allocating from a movable pageblock) and we do have compaction in place. There are corner cases but in don't think they are very relevant. [...] >> Especially page compaction/migration in the guest might be tricky. AFAIK >> it only works on oder-0 pages. E.g., whenever you allocated a >> higher-order page in the guest and reported it to your hypervisor, you >> want to split it into separate order-0 pages before adding them to the >> balloon list. Otherwise, you won't be able to tag them as movable and >> handle them via the existing balloon compaction framework - and that >> would be a major step backwards, because you would be heavily >> fragmenting your guest (and even turning MAX_ORDER - 1 into unmovable >> pages means that memory offlining/alloc_contig_range() users won't be >> able to move such pages around anymore). > > Yes, from what I can tell compaction will not touch anything that is > pageblock size or larger. I am not sure if that is an issue or not. > > For migration is is a bit of a different story. It looks like there is > logic in place for migrating huge and transparent huge pages, but not > higher order pages. I'll have to take a look through the code some > more to see just how difficult it would be to support migrating a 2M > page. I can probably make it work if I just configure it as a > transparent huge page with the appropriate flags and bits in the page > being set. Note: With virtio-balloon you actually don't necessarily want to migrate the higher-order page. E.g., migrating a higher-order page might fail because there is no migration target available. Instead, you would want "migrate" to multiple smaller pieces. This is esp., interesting for alloc_contig_range() users. Something that the current 4k pages can handle just nicely. > >> But then, the balloon compaction will result in single 4k pages getting >> moved and deflated+inflated. Once you have order-0 pages in your list, >> deflating higher-order pages becomes trickier. > > I probably wouldn't want to maintain them as individual lists. In my > mind it would make more sense to have two separate lists with separate > handlers for each. Then in the event of something such as a deflate we > could choose what we free based on the number of pages we need to > free. That would allow us to deflate the balloon quicker in the case > of a low-memory condition which should improve our responsiveness. In > addition with the driver sitting on a reserve of higher-order pages it > could help to alleviate fragmentation in such a case as well since it > could release larger contiguous blocks of memory. > >> E.g., have a look at the vmware balloon (drivers/misc/vmw_balloon.c). It >> will allocate either 4k or 2MB pages, but won't be able to handle them >> for balloon compaction. They don't even bother about other granularity. >> >> >> Long story short: Inflating higher-order pages could be good for >> inflation performance in some setups, but I think you'll have to fall >> back to lower-order allocations + balloon compaction on 4k. > > I'm not entirely sure that is the case. It seems like with a few > tweaks to things we could look at doing something like splitting the > balloon so that we have a 4K and a 2M balloon. At that point it would > just be a matter of registering a pair of address space handlers so > that the 2M balloons are handled correctly if there is a request to > migrate their memory. As far as compaction that is another story since > it looks like 2M pages will not be compacted. I am not convinced what you describe is a real issue that needs such a solution. Maybe we can come up with numbers that prove this. (e.g., #THP, fragmentation, benchmark performance in your guest, etc.). I'll try digging out that huge page ballooning for KVM paper, maybe that has any value. -- Thanks, David / dhildenb