On Wed, Jan 30, 2019 at 1:28 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Wed, Jan 30, 2019 at 12:48 PM Ben Gardon <bgardon@xxxxxxxxxx> wrote: > > > > There are many KVM kernel memory allocations which are tied to the life of > > the VM process and should be charged to the VM process's cgroup. If the > > allocations aren't tied to the process, the OOM killer will not know > > that killing the process will free the associated kernel memory. > > Add __GFP_ACCOUNT flags to many of the allocations which are not yet being > > charged to the VM process's cgroup. > > > > Tested: > > Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch > > introduced no new failures. > > Ran a kernel memory accounting test which creates a VM to touch > > memory and then checks that the kernel memory is within ceratin > > bounds. > > With this patch that memory accounting is more (see below) > > correct. > > > > There remain a few allocations which should be charged to the VM's > > cgroup but are not. In x86, they include: > > vcpu->run > > vcpu->arch.pio_data > > kvm->coalesced_mmio_ring > > There allocations are unaccounted in this patch because they are mapped > > to userspace, and accounting them to a cgroup causes problems. This > > should be addressed in a future patch. > > Yes, this is based on the assumption that memcg charged kmem should > not be mapped to userspace and PG_kmemcg is define on page->_mapcount > field. > > Seems like these fields are per-vcpu and are of PAGE_SIZE which I > think should not be ignored as system overhead. > > The easiest possible solution seems like to move PG_kmemcg to actual > page flags but page flags are very rare resource. The other way is to > have an explicit interface similar to mem_cgroup_charge_skmem(). > However that would be error prone. For network skbuff > allocations/deallocations are at one place but that's not the case > here. > > > > > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx> > > Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx> BTW there are kmem_cache_allocs whose kmem_caches already have SLAB_ACCOUNT flag and thus no need to add GFP_KERNEL_ACCOUNT flag in the allocation but I think it's fine as this will remove any confusion. Shakeel