Re: [PATCH RFC] mm: mmap: Change DEFAULT_MAX_MAP_COUNT to INT_MAX

Petr Špaček <pspacek@xxxxxxx> · Fri, 30 Aug 2024 19:00:33 +0200

On 30. 08. 24 17:04, Pedro Falcato wrote:
On Fri, Aug 30, 2024 at 04:28:33PM GMT, Petr Špaček wrote:
Now I understand your concern. From the docs and code comments I've seen it
was not clear that the limit serves _another_ purpose than mere
compatibility shim for old ELF tools.

It is a NACK, but it's a NACK because of the limit being so high.

With steam I believe it is a product of how it performs allocations, and
unfortunately this causes it to allocate quite a bit more than you would
expect.

FTR select non-game applications:

ElasticSearch and OpenSearch insist on at least 262144.
DNS server BIND 9.18.28 linked to jemalloc 5.2.1 was observed with usage
around 700000.
OpenJDK GC sometimes weeps about values < 737280.
SAP docs I was able to access use 1000000.
MariaDB is being tested by their QA with 1048576.
Fedora, Ubuntu, NixOS, and Arch distros went with value 1048576.

Is it worth sending a patch with the default raised to 1048576?

With jemalloc() that seems strange, perhaps buggy behaviour?

Good question. In case of BIND DNS server, jemalloc handles mmap() and we
keep statistics about bytes requested from malloc().

When we hit max_map_count limit the
(sum of not-yet-freed malloc(size)) / (vm.max_map_count)
gives average size of mmaped block ~ 100 k.

Is 100 k way too low / does it indicate a bug? It does not seem terrible to
me - the application is handling ~ 100-1500 B packets at rate somewhere
between 10-200 k packets per second so it's expected it does lots of small
short lived allocations.

A complicating factor is that the process itself does not see the current
counter value (unless BPF is involved) so it's hard to monitor this until
the limit is hit.

Can you get us a dump of the /proc/<pid>/maps? It'd be interesting to see how
exactly you're hitting this.

I have immediately available only a coredump from hitting the default 
limit. GDB apparently does not show these regions in "info proc 
mappings", but I was able to extract section addresses from the coredump:
https://users.isc.org/~pspacek/sf1717/elf-sections.csv

Distribution of section sizes and their count in format "size,count" is 
here:
https://users.isc.org/~pspacek/sf1717/sizes.csv

If you want to see some cumulative stats they are as OpenDocument here:
https://users.isc.org/~pspacek/sf1717/sizes.ods

From a quick glance it is obvious that single-page blocks eat most of 
the quota.

I don't know if it is a bug or just memory fragmentation caused by a 
long-running server application.

I can try to get data from production system to you next week if needed.

--
Petr Špaček
Internet Systems Consortium