On 08.01.2013 16:46, Dave Allan wrote: > On Tue, Jan 08, 2013 at 04:42:00PM +0100, Michal Privoznik wrote: >> On 08.01.2013 16:24, Daniel P. Berrange wrote: >>> On Tue, Jan 08, 2013 at 10:37:19AM +0100, Michal Privoznik wrote: >>>> Currently, if there's no hard memory limit defined for a domain, >>>> libvirt tries to calculate one, based on domain definition and magic >>>> equation and set it upon the domain startup. The rationale behind was, >>>> if there's a memory leak or exploit in qemu, we should prevent the >>>> host system trashing. However, the equation was too tightening, as it >>>> didn't reflect what the kernel counts into the memory used by a >>>> process. Since many hosts do have a swap, nobody hasn't noticed >>>> anything, because if hard memory limit is reached, process can >>>> continue allocating memory on a swap. However, if there is no swap on >>>> the host, the process gets killed by OOM killer. In our case, the qemu >>>> process it is. >>>> >>>> To prevent this, we need to relax the hard RSS limit. Moreover, we >>>> should reflect more precisely the kernel way of accounting the memory >>>> for process. That is, even the kernel caches are counted within the >>>> memory used by a process (within cgroups at least). Hence the magic >>>> equation has to be changed: >>>> >>>> limit = 1.5 * (domain memory + total video memory) + (32MB for cache >>>> per each disk) + 200MB >>>> --- >>>> >>>> There is a bit more that should be taken into account, e.g. shared >>>> pages, where accounting is even more complicated: >>>> >>>> "Shared pages are accounted on the basis of the first touch approach. >>>> The cgroup that first touches a page is accounted for the page." [1] >>>> >>>> I don't we even want to try to reflect this in our code. That's why >>>> the coefficient of domain memory has been lifted from 1.02 to 1.5, in >>>> hope it will just be enough. >>>> >>>> 1: http://www.kernel.org/doc/Documentation/cgroups/memory.txt >>>> >>>> src/qemu/qemu_cgroup.c | 15 +++++++++------ >>>> 1 file changed, 9 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c >>>> index 7faf025..16a9d7c 100644 >>>> --- a/src/qemu/qemu_cgroup.c >>>> +++ b/src/qemu/qemu_cgroup.c >>>> @@ -343,15 +343,18 @@ int qemuSetupCgroup(virQEMUDriverPtr driver, >>>> unsigned long long hard_limit = vm->def->mem.hard_limit; >>>> >>>> if (!hard_limit) { >>>> - /* If there is no hard_limit set, set a reasonable >>>> - * one to avoid system trashing caused by exploited qemu. >>>> - * As 'reasonable limit' has been chosen: >>>> - * (1 + k) * (domain memory + total video memory) + F >>>> - * where k = 0.02 and F = 200MB. */ >>>> + /* If there is no hard_limit set, set a reasonable one to avoid >>>> + * system trashing caused by exploited qemu. As 'reasonable limit' >>>> + * has been chosen: >>>> + * (1 + k) * (domain memory + total video memory) + (32MB for >>>> + * cache per each disk) + F >>>> + * where k = 0.5 and F = 200MB. The cache for disks is important as >>>> + * kernel cache on the host side counts into the RSS limit. */ >>>> hard_limit = vm->def->mem.max_balloon; >>>> for (i = 0; i < vm->def->nvideos; i++) >>>> hard_limit += vm->def->videos[i]->vram; >>>> - hard_limit = hard_limit * 1.02 + 204800; >>>> + hard_limit = hard_limit * 1.5 + 204800; >>>> + hard_limit += vm->def->ndisks * 32768; >>>> } >>>> >>>> rc = virCgroupSetMemoryHardLimit(cgroup, hard_limit); >>> >>> ACK, >>> >>> can't say I'm a fan of our heuristics but I don't see a better way >>> yet. Lets see how this new limit copes. >>> >>> Daniel >>> >> >> Yeah, it's sort of magic. Pushed now. Thanks. > > How does one turn off the limits? > > Dave Either disable mem cgroup (e.g. by unmounting it), or set own limit in the domain XML (libvirt won't even try to calculate new one then). Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list