On Fri, Aug 30, 2024 at 11:56:36AM GMT, Petr Spacek wrote: > From: Petr Spacek <pspacek@xxxxxxx> > > Raise default sysctl vm.max_map_count to INT_MAX, which effectively > disables the limit for all sane purposes. The sysctl is kept around in > case there is some use-case for this limit. > > The old default value of vm.max_map_count=65530 provided compatibility > with ELF format predating year 2000 and with binutils predating 2010. At > the same time the old default caused issues with applications deployed > in 2024. > > State since 2012: Linux 3.2.0 correctly generates coredump from a > process with 100 000 mmapped files. GDB 7.4.1, binutils 2.22 work with > this coredump fine and can actually read data from the mmaped addresses. > > Signed-off-by: Petr Spacek <pspacek@xxxxxxx> NACK. > --- > > Downstream distributions started to override the default a while ago. > Individual distributions are summarized at the end of this message: > https://lists.archlinux.org/archives/list/arch-dev-public@xxxxxxxxxxxxxxxxxxx/thread/5GU7ZUFI25T2IRXIQ62YYERQKIPE3U6E/ Did they change them to 2.14 billion? > > Please note it's not only games in emulator which hit this default > limit. Larger instances of server applications are also suffering from > this. Couple examples here: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2057792/comments/24 > > SAP documentation behind paywall also mentions this limit: > https://service.sap.com/sap/support/notes/2002167 > > And finally, it is also an issue for BIND DNS server compiled against > jemalloc, which is what brought me here. > > System V gABI draft dated 2000-07-17 already extended the ELF numbering: > https://www.sco.com/developers/gabi/2000-07-17/ch4.sheader.html > > binutils support is in commit ecd12bc14d85421fcf992cda5af1d534cc8736e0 > dated 2010-01-19. IIUC this goes a bit beyond what is described in the > gABI document and extends ELF's e_phnum. > > Linux coredumper support is in commit > 8d9032bbe4671dc481261ccd4e161cd96e54b118 dated 2010-03-06. > > As mentioned above, this all works for the last 12 years and the > conservative limit seems to do more harm than good. > > include/linux/mm.h | 21 +++++++++------------ > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 6549d0979..3e1ed3b80 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -178,22 +178,19 @@ static inline void __mm_zero_struct_page(struct page *page) > > /* > * Default maximum number of active map areas, this limits the number of vmas > - * per mm struct. Users can overwrite this number by sysctl but there is a > - * problem. > + * per mm struct. Users can overwrite this number by sysctl. Historically > + * this limit was a compatibility measure for ELF format predating year 2000. > * > * When a program's coredump is generated as ELF format, a section is created > - * per a vma. In ELF, the number of sections is represented in unsigned short. > - * This means the number of sections should be smaller than 65535 at coredump. > - * Because the kernel adds some informative sections to a image of program at > - * generating coredump, we need some margin. The number of extra sections is > - * 1-3 now and depends on arch. We use "5" as safe margin, here. > + * per a vma. In ELF before year 2000, the number of sections was represented > + * as unsigned short e_shnum. This means the number of sections should be > + * smaller than 65535 at coredump. > * > - * ELF extended numbering allows more than 65535 sections, so 16-bit bound is > - * not a hard limit any more. Although some userspace tools can be surprised by > - * that. > + * ELF extended numbering was added into System V gABI spec around 2000. > + * It allows more than 65535 sections, so 16-bit bound is not a hard limit any > + * more. > */ > -#define MAPCOUNT_ELF_CORE_MARGIN (5) > -#define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN) > +#define DEFAULT_MAX_MAP_COUNT INT_MAX NACK, you can't abitrarily change an established limit like this. Also VMAs have a non-zero size. On my system, 184 bytes. So your change allows for ~395 GiB to be assigned to VMAs. Does that seem reasonable? It _might_ be sensible to increase the minimum, not to INT_MAX. Also note that you _can_ change this limit, it's a tunable. It's not egregious to you know, change a tunable. Also please cc- the MEMORY MAPPING reviewers for changes like this. It wasn't obvious because include/linux/mm.h isn't included in the MAINTAINERS block but that's me, Liam and Vlastimil, cc'd now. > > extern int sysctl_max_map_count; > > > base-commit: d5d547aa7b51467b15d9caa86b116f8c2507c72a > -- > 2.46.0 > >