Sorry for the slow reply.
Re: Module parameters: I prefer not to have module parameters since they are controlled by the guest. In general, in virtualized environments the admins controlling the hypervisor are more knowledgeable about these things than the users. A feature bit seems useful so that the host knows what the guest behavior will be, and can change the host side implementation to make the experience good for the guest.
I worry that requiring global_node_page_state(NR_FILE_PAGES) == 0 before allowing deflation is too strict. One of the benefits of the shrinker API is that it is invoked before vmscan.c has gone through heroic efforts to reclaim the world. I'm not familiar enough with the code to judge how this patch impacts this, but would it be beneficial to allow deflation when vmscan.c is trying "too hard" to reclaim pages? Is there some softer condition than "global_node_page_state(NR_FILE_PAGES) == 0"?
For my own understanding, does this patch work by returning 0 pages when asked for pages? Are there cases where that results in an unnecessary OOM? For example, if global_node_page_state(NR_FILE_PAGES) == 1, and the guest needs 2?
Regarding other shrinkers (like KVM MMU cache): Reclaiming other shrinkers first would match the behavior of DEFLATE_ON_OOM when it was using the OOM notifier callback. On the other hand (awkwardly), the memory stats reported on the stats queue for "available memory" do not count shrinker memory as "available". So a balloon implementation that aims to reclaim some amount of available memory would not be able to tell how much memory was in the shrinkers and probably doesn't expect to reclaim them. For this reason, I think only looking at page cache size is the right choice. There should be a 1:1 relationship between stats reported and when DEFLATE_ON_OOM is invoked. Maybe in the future we add another stat that reports shrinker sizes, in which case we should also add a feature bit that allows other shrinkers to be pressured.
Regarding NUMA awareness: I agree it's out of scope for this patch since all implementations so far are not NUMA aware.
Would it be possible to back port this patch to 4.19 when the change to shrinker API was made?
On Tue, Feb 11, 2020 at 6:20 AM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
On 2020/02/10 16:27, Wang, Wei W wrote:
>> Well, my comment is rather: "Do not try to reserve guest's memory. In other
>> words, do not try to maintain balloons on the guest side. Since host would
>> be able to cache file data on the host's cache, guests would be able to
>> quickly fetch file data from host's cache via normal I/O requests." ;-)
>
> Didn't this one. The discussion was about guest pagecache pages v.s. guest balloon pages.
> Why is host's pagecache here?
I'm expecting a mode: "Guests should try to minimize pagecache pages (and teach
host to treat reclaimed pages as if POSIX_FADV_DONTNEED) instead of managing
guest balloon pages". In other words, as if
while :; sleep 5; echo 1 > /proc/sys/vm/drop_caches; done
is running in the guest's kernel. And as if
echo 2 > /proc/sys/vm/drop_caches
is triggered in the guest's kernel when host requested guests to reclaim
memory. No long-life balloons. Guest balloons do not need to care about
NUMA. Just leave the management of pagecache pages to the host.
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization