On Thu 29-11-18 18:46:17, Niklas Hambüchen wrote: > Hello, > > I'm trying to investigate the memory behaviour of a program that uses madvise(MADV_FREE) to tell the kernel that it no longer uses some pages. > > I'm seeing some things I can't quite explain, concerning when freeing happens and how it is accounted for in /proc/pid/smaps. > > `man madvise` shows: > > MADV_FREE (since Linux 4.5) > The application no longer requires the pages in the range > specified by addr and len. The kernel can thus free these > pages, but the freeing could be delayed until memory pressure > occurs. > ... > On a swapless system, freeing > pages in a given range happens instantly, regardless of memory > pressure. This part is outdated since 93e06c7a6453 ("mm: enable MADV_FREE for swapless system") since 4.12. Something to fix in the man page. I will send a patch for that. Thanks for pointing it out. > https://www.kernel.org/doc/Documentation/filesystems/proc.txt says: > > "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE). > The memory isn't freed immediately with madvise(). It's freed in memory > pressure if the memory is clean. Please note that the printed value might > be lower than the real value due to optimizations used in the current > implementation. If this is not desirable please file a bug report. > > First, I am on a swapless system. > Nevertheless do I do not observe freeing happening instantly. > Instead, freeing does happen only under memory pressure. Yes this is how MADV_FREE is implemented. > For example, on a 64 GB RAM machine I have a process taking 30 GB resident memory ("RES" in tools like htop). After I put on memory pressure (for example using `stress-ng --vm-bytes 1G --vm-keep -m 50` to allocate and touch 50 GB), RES for that process decreases to 10 GB. > > At the same time, I can see the number in LazyFree decrease during this operation. Those pages get reclaimed under memory pressure. > According to the man page, I would not expect this "ballooning" to be > necessary given that I have no swap. > > Question 1: > Is `man madvise` outdated? Or am I measuring wrong? yep. > Question 2: > Is the swap condition really binary? E.g. if the man page is accurate, would me adding 1 MB swap already make a difference in the behaviour, or are there more sophisticated rules at play? It used to be like that. > Second, as you can see above, the proc-documentation of LazyFree does not mention any special swap rules. > > Third, can anybody elaborate on "the printed value might be lower > than the real value due to optimizations used in the current > implementation"? How far off might the reported LazyFree be? We batch multiple pages to become really lazyfree. This means that those pages are sitting on a per-cpu list (see mark_page_lazyfree). So the the number drift depends on the number of CPUs. > For my investigation it would be very useful if I could get accurate accounting. > How much work would the "If this is not desirable please file a bug report" bit entail? What would be the reason to get the exact number? -- Michal Hocko SUSE Labs