On 30.11.2016 14:22, Viktor Mihajlovski wrote: > On 30.11.2016 13:20, Martin Kletzander wrote: >> On Mon, Nov 28, 2016 at 06:03:36PM +0100, Viktor Mihajlovski >> wrote: >>> With kernel 3.18 (since commit >>> 3e32cb2e0a12b6915056ff04601cf1bb9b44f967) the "unlimited" value >>> for cgroup memory limits has changed once again as its byte >>> value is now computed from a page counter. The new "unlimited" >>> value reported by the cgroup fs is therefore 2**51-1 pages which >>> is (VIR_DOMAIN_MEMORY_PARAM_UNLIMITED - 3072). This results e.g. >>> in virsh memtune displaying 9007199254740988 instead of unlimited >>> for the limits. >>> >>> This patch uses the value of memory.limit_in_bytes from the >>> cgroup memory root which is the system's "real" unlimited value >>> for comparison. >>> >>> See also libvirt commit 231656bbeb9e4d3bedc44362784c35eee21cf0f4 >>> for the history for kernel 3.12 and before. >>> >>> I've tested this on F24 with the following configurations: - no >>> memory cgroup controller mounted - memory cgroup controller >>> mounted but not configured for libvirt - memory cgroup >>> controller mounted and configured The first two fail as expected >>> (and as before), the third case works as expected. >>> >>> Testing on other kernel versions highly welcome! >>> >>> Not perfect yet in that we still provide a fallback to the old >>> value. We might consider failing right away if we can't get the >>> system value. I'd be inclined to do that, since we're probably >>> facing principal cgroup issues in this case. >>> >> >> Since the code is called only after reading another value worked, >> it _should not_ fail =) But I'm OK with both failing and falling >> back to the old value. Mostly because I don't think it will make >> any (significant) difference. >> > OK, this tips me towards the no fallback. >>> Further, it's not the most efficient implementation. Obviously, >>> the unlimited value can be read once and cached. However, I'd >>> like to see the question above resolved first. >>> >> >> But I would really like to cache the value in a global variable. >> You can use VIR_ONCE_GLOBAL_INIT for that, but maybe it's too much, >> especially if you init the value before any other thread could >> access it. >> > Sure, even if the initialization was racy, I see no way the global > long long value could be corrupted. >>> Signed-off-by: Viktor Mihajlovski <mihajlov@xxxxxxxxxxxxxxxxxx> >>> --- src/util/vircgroup.c | 61 >>> ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file >>> changed, 55 insertions(+), 6 deletions(-) >>> >>> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index >>> f151193..969dca5 100644 --- a/src/util/vircgroup.c +++ >>> b/src/util/vircgroup.c @@ -2452,6 +2452,40 @@ >>> virCgroupGetBlkioDeviceWeight(virCgroupPtr group, } >>> >>> >>> +/* + * Retrieve the "memory.limit_in_bytes" value from the >>> memory controller + * root dir. This value cannot be modified by >>> userspace and therefore + * is the maximum limit value supported >>> by cgroups on the local system. + */ +static int >>> +virCgroupGetMemoryUnlimited(unsigned long long int * >>> mem_unlimited) +{ + int ret = -1; + virCgroupPtr group; + >> >> Also, all this ↓ >> >>> + if (VIR_ALLOC(group)) + goto cleanup; + + if >>> (virCgroupDetectMounts(group)) + goto cleanup; + + if >>> (!group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint) + >>> goto cleanup; + + if >>> (VIR_STRDUP(group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].placement, >>> >>> >>> > + "/.") < 0) >>> + goto cleanup; + >> >> ↑ would be cleaner this way: >> >> if (virCgroupNew(-1, "/", NULL, NULL, &group) < 0) return -1; >> >> if (!virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_MEMORY)) >> goto cleanup; >> >> I'm not passing VIR_CGROUP_CONTROLLER_MEMORY to virCgroupNew() so >> that it doesn't fail with a cryptic message. >> > Looks cleaner indeed ... I'll give it a try. >>> + ret = virCgroupGetValueU64(group, + >>> VIR_CGROUP_CONTROLLER_MEMORY, + "memory.limit_in_bytes", + >>> mem_unlimited); + cleanup: + virCgroupFree(&group); + return >>> ret; +} + + /** * virCgroupSetMemory: * @@ -2534,6 +2568,7 @@ int >>> virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long >>> long *kb) { long long unsigned int limit_in_bytes; + long long >>> unsigned int unlimited_in_bytes; int ret = -1; >>> >>> if (virCgroupGetValueU64(group, @@ -2541,9 +2576,13 @@ >>> virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long >>> long *kb) "memory.limit_in_bytes", &limit_in_bytes) < 0) goto >>> cleanup; >>> >>> - *kb = limit_in_bytes >> 10; - if (*kb > >>> VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + if >>> (virCgroupGetMemoryUnlimited(&unlimited_in_bytes) < 0) + >>> unlimited_in_bytes = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED << 10; + >>> + if (limit_in_bytes == unlimited_in_bytes) >> >> I don't know why, but I would feel more comfortable with '>=' >> there, although it doesn't make any difference (or sense). >> >>> *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + else + *kb >>> = limit_in_bytes >> 10; >>> >>> ret = 0; cleanup: >> >> As a nit, helper function for these would be nice. >> >> Otherwise, I like it. > Thanks for the feedback. > FYI, I've posted new version that uses thread-safe initialization https://www.redhat.com/archives/libvir-list/2016-December/msg00176.html According to some documents I've read it seems that on ARM 64-bit memory access may be interrupted, so I didn't want to take a chance. The checking code is not really performance critical, as we have to walk the cgroup filesyste anyway in order to get the process' limits. -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Köderitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list