From: Suren Baghdasaryan <surenb@xxxxxxxxxx> Subject: mm: prevent vm_area_struct::anon_name refcount saturation A deep process chain with many vmas could grow really high. With default sysctl_max_map_count (64k) and default pid_max (32k) the max number of vmas in the system is 2147450880 and the refcounter has headroom of 1073774592 before it reaches REFCOUNT_SATURATED (3221225472). Therefore it's unlikely that an anonymous name refcounter will overflow with these defaults. Currently the max for pid_max is PID_MAX_LIMIT (4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In this configuration anon_vma_name refcount overflow becomes theoretically possible (that still require heavy sharing of that anon_vma_name between processes). kref refcounting interface used in anon_vma_name structure will detect a counter overflow when it reaches REFCOUNT_SATURATED value but will only generate a warning and freeze the ref counter. This would lead to the refcounted object never being freed. A determined attacker could leak memory like that but it would be rather expensive and inefficient way to do so. To ensure anon_vma_name refcount does not overflow, stop anon_vma_name sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still leaves INT_MAX/2 (1073741823) values before the counter reaches REFCOUNT_SATURATED. This should provide enough headroom for raising the refcounts temporarily. Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@xxxxxxxxxx Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@xxxxxxxxxx Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Alexey Gladkov <legion@xxxxxxxxxx> Cc: Chris Hyser <chris.hyser@xxxxxxxxxx> Cc: Christian Brauner <brauner@xxxxxxxxxx> Cc: Colin Cross <ccross@xxxxxxxxxx> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Peter Collingbourne <pcc@xxxxxxxxxx> Cc: Sasha Levin <sashal@xxxxxxxxxx> Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Xiaofeng Cao <caoxiaofeng@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/mm_inline.h | 18 ++++++++++++++---- mm/madvise.c | 3 +-- 2 files changed, 15 insertions(+), 6 deletions(-) --- a/include/linux/mm_inline.h~mm-prevent-vm_area_struct-anon_name-refcount-saturation +++ a/include/linux/mm_inline.h @@ -161,15 +161,25 @@ static inline void anon_vma_name_put(str kref_put(&anon_name->kref, anon_vma_name_free); } +static inline +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) +{ + /* Prevent anon_name refcount saturation early on */ + if (kref_read(&anon_name->kref) < REFCOUNT_MAX) { + anon_vma_name_get(anon_name); + return anon_name; + + } + return anon_vma_name_alloc(anon_name->name); +} + static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, struct vm_area_struct *new_vma) { struct anon_vma_name *anon_name = anon_vma_name(orig_vma); - if (anon_name) { - anon_vma_name_get(anon_name); - new_vma->anon_name = anon_name; - } + if (anon_name) + new_vma->anon_name = anon_vma_name_reuse(anon_name); } static inline void free_anon_vma_name(struct vm_area_struct *vma) --- a/mm/madvise.c~mm-prevent-vm_area_struct-anon_name-refcount-saturation +++ a/mm/madvise.c @@ -113,8 +113,7 @@ static int replace_anon_vma_name(struct if (anon_vma_name_eq(orig_name, anon_name)) return 0; - anon_vma_name_get(anon_name); - vma->anon_name = anon_name; + vma->anon_name = anon_vma_name_reuse(anon_name); anon_vma_name_put(orig_name); return 0; _