Add the necessary calls to track VM anonymous page usage (only). V3 changes: * Use vma->vm_mm instead of current->mm when charging pages, for clarity * Document that reclaim is not possible with only anonymous page accounting so the OOM-killer is invoked when a limit is exceeded * Add TODO to implement file cache (reclaim) support or optimize away page_cgroup->lru V2 changes: * Added update of memory cgroup documentation * Clarify use of 'file' to distinguish anonymous mappings Signed-off-by: Steven J. Magnani <steve@xxxxxxxxxxxxxxx> --- diff -uprN a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt --- a/Documentation/cgroups/memory.txt 2010-10-05 09:14:36.000000000 -0500 +++ b/Documentation/cgroups/memory.txt 2010-10-21 07:25:24.000000000 -0500 @@ -34,6 +34,7 @@ Current Status: linux-2.6.34-mmotm(devel Features: - accounting anonymous pages, file caches, swap caches usage and limiting them. + NOTE: On NOMMU systems, only anonymous pages are accounted. - private LRU and reclaim routine. (system's global LRU and private LRU work independently from each other) - optionally, memory+swap usage can be accounted and limited. @@ -640,13 +641,41 @@ At reading, current status of OOM is sho under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may be stopped.) -11. TODO +11. NOMMU Support + +Systems without a Memory Management Unit do not support virtual memory, +swapping, page faults, or migration, and are therefore limited to operating +entirely within the system's RAM. On such systems, maintaining an ability to +allocate sufficiently large blocks of contiguous memory is usually a challenge. +This makes the overhead involved in memory cgroup support more of a concern, +particularly when the memory page size is small. + +Typically, embedded systems are comparatively simple and deterministic, and are +required to remain stable over long periods. Invocation of the OOM-killer, were +it to occur in an uncontrolled manner, would likely destabilize such systems. + +Even a well-designed system may be presented with external stimuli that could +lead to OOM conditions. One example is a system that is required to check a +user-supplied removable FAT filesystem. As there is no way to bound the size +or coherence of the user's filesystem, the memory required to run dosfsck on +it may exceed the system's capacity. Running dosfsck in a memory cgroup +can preserve system stability even in the face of excessive memory demands. + +At the present time, only anonymous pages are included in NOMMU memory cgroup +accounting. As anonymous pages are not reclaimable, when a memory cgroup +exceeds its limit, reclaim will fail and the OOM-killer will be invoked. +See the Reclaim section of this document. + +12. TODO 1. Add support for accounting huge pages (as a separate controller) 2. Make per-cgroup scanner reclaim not-shared pages first 3. Teach controller to account for shared-pages 4. Start reclamation in the background when the limit is not yet hit but the usage is getting closer +5. NOMMU: implement file cache accounting (which would support reclaim) + or optimize away page_cgroup->lru, which is just per-page overhead when + reclaim is not supported. Summary diff -uprN a/mm/nommu.c b/mm/nommu.c --- a/mm/nommu.c 2010-10-13 08:20:38.000000000 -0500 +++ b/mm/nommu.c 2010-10-20 07:34:11.000000000 -0500 @@ -524,8 +524,10 @@ static void delete_nommu_region(struct v /* * free a contiguous series of pages */ -static void free_page_series(unsigned long from, unsigned long to) +static void free_page_series(unsigned long from, unsigned long to, + const struct file *file) { + mem_cgroup_uncharge_start(); for (; from < to; from += PAGE_SIZE) { struct page *page = virt_to_page(from); @@ -534,8 +536,13 @@ static void free_page_series(unsigned lo if (page_count(page) != 1) kdebug("free page %p: refcount not one: %d", page, page_count(page)); + /* Only anonymous pages are charged, currently */ + if (!file) + mem_cgroup_uncharge_page(page); + put_page(page); } + mem_cgroup_uncharge_end(); } /* @@ -563,7 +570,8 @@ static void __put_nommu_region(struct vm * from ramfs/tmpfs mustn't be released here */ if (region->vm_flags & VM_MAPPED_COPY) { kdebug("free series"); - free_page_series(region->vm_start, region->vm_top); + free_page_series(region->vm_start, region->vm_top, + region->vm_file); } kmem_cache_free(vm_region_jar, region); } else { @@ -1117,9 +1125,27 @@ static int do_mmap_private(struct vm_are set_page_refcounted(&pages[point]); base = page_address(pages); - region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY; + region->vm_start = (unsigned long) base; region->vm_end = region->vm_start + rlen; + + /* Only anonymous pages are charged, currently */ + if (!vma->vm_file) { + for (point = 0; point < total; point++) { + int charge_failed = + mem_cgroup_newpage_charge(&pages[point], + vma->vm_mm, + GFP_KERNEL); + if (charge_failed) { + free_page_series(region->vm_start, + region->vm_end, NULL); + region->vm_start = region->vm_end = 0; + goto enomem; + } + } + } + + region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY; region->vm_top = region->vm_start + (total << PAGE_SHIFT); vma->vm_start = region->vm_start; @@ -1150,7 +1176,7 @@ static int do_mmap_private(struct vm_are return 0; error_free: - free_page_series(region->vm_start, region->vm_end); + free_page_series(region->vm_start, region->vm_end, vma->vm_file); region->vm_start = vma->vm_start = 0; region->vm_end = vma->vm_end = 0; region->vm_top = 0; @@ -1213,16 +1239,15 @@ unsigned long do_mmap_pgoff(struct file INIT_LIST_HEAD(&vma->anon_vma_chain); vma->vm_flags = vm_flags; vma->vm_pgoff = pgoff; + vma->vm_mm = current->mm; if (file) { region->vm_file = file; get_file(file); vma->vm_file = file; get_file(file); - if (vm_flags & VM_EXECUTABLE) { + if (vm_flags & VM_EXECUTABLE) added_exe_file_vma(current->mm); - vma->vm_mm = current->mm; - } } down_write(&nommu_region_sem); @@ -1555,7 +1580,7 @@ static int shrink_vma(struct mm_struct * add_nommu_region(region); up_write(&nommu_region_sem); - free_page_series(from, to); + free_page_series(from, to, vma->vm_file); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>