I'm thinking about ways to improve the Xen balloon driver. This is the driver which allows the guest domain to expand or contract by either asking for more memory from the hypervisor, or giving unneeded memory back. From the kernel's perspective, it simply looks like a driver which allocates and frees pages; when it allocates memory it gives the underlying physical page back to the hypervisor. And conversely, when it gets a page from the hypervisor, it glues it under a given pfn and releases that page back to the kernel for reuse. At the moment it's very dumb, and is pure mechanism. It's told how much memory to target, and it either allocates or frees memory until the target is reached. Unfortunately, that means if it's asked to shrink to an unreasonably small size, it will do so without question, killing the domain in a thrash-storm in the process. There are several problems: 1. it doesn't know what a reasonable lower limit is, and 2. it doesn't moderate the rate of shrinkage to give the rest of the VM time to adjust to having less memory (by paging out, dropping inactive, etc) And possibly the third point is that the only mechanism it has for applying memory pressure to the system is by allocating memory. It allocates with (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC), trying not to steal memory away from things that really need it. But in practice, it can still easy drive the machine into a massive unrecoverable swap storm. So I guess what I need is some measurement of "memory use" which is perhaps akin to a system-wide RSS; a measure of the number of pages being actively used, that if non-resident would cause a large amount of paging. If you shrink the domain down to that number of pages + some padding (x%?), then the system will run happily in a stable state. If that number increases, then the system will need new memory soon, to stop it from thrashing. And if that number goes way below the domain's actual memory allocation, then it has "too much" memory. Is this what "Active" accounts for? Is Active just active usermode/pagecache pages, or does it also include kernel allocations? Presumably Inactive Clean memory can be freed very easily with little impact on the system, Inactive Dirty memory isn't needed but needs IO to free; is there some way to measure how big each class of memory is? If you wanted to apply gentle memory pressure on the system to attempt to accelerate freeing memory, how would you go about doing that? Would simply allocating memory at a controlled rate achieve it? I guess it also gets more complex when you bring nodes and zones into the picture. Does it mean that this computation would need to be done per node+zone rather than system-wide? Or is there some better way to implement all this? Thanks, J _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/virtualization