On 9/3/20 11:26 AM, Colin Cross wrote: >> FWIW, I don't have any objections to this. >> >> Refcounting strings was what I think I had the strongest reaction to >> back in the good old days of 2013. strdup() on split plus strcmp() on >> merge doesn't sound afwul to me, and it is darn straightforward. The >> biggest downside is probably kernel memory consumption. We should >> probably just think through whether having so many duplicates changes >> things materially. >> >> For instance, should/could we penalize a task's vm.max_map_count when >> it's using this mechanism? > Just to provide some concrete numbers, the ART process I examined > (https://pastebin.com/YNUTvZyz) had 280 named anonymous mappings using > a total of 6566 bytes for the names. There were only 63 unique names, > using 1925 bytes. On my personal usage device, there are currently a > total of 59769 named anonymous devices across all processes using > 1224119 bytes, 5843 of them unique using 121754 bytes. The vast > majority of the unique names are of the form "stack_and_tls:999", > which are dynamically allocated in the userspace process' heap. There > are only 132 names that do not contain stack_and_tls using 9540 bytes, > repeated 49938 times using 1030651 bytes (108x). Most of those are > constant strings, meaning the pointer is into the .rodata section of a > file mapping that is shared between all processes. Thanks for the data! That seems like totally reasonable memory consumption in the normal cases. I'm mostly concerned about the worst-case kernel memory consumption. It's a DoS if there's a big change to the amount of kernel memory a process can consume. For instance, each VMA is 192 bytes. If the name limit was, say, 4k, it makes the per-VMA kernel memory consumption go up by a factor of 20, which isn't nice. But, if it's, say 16 bytes, it's only a 10% increase over what each VMA consumes today. > Is fork a concern? It would have to strdup every name. Should be pretty easy to confirm the worst case experimentally. Create a process with ~64k VMAs, time the fork()s. Then do the same with 64k VMAs with the longest names set on each VMA. If you can't time a difference, you'll have a strong argument that someone with 10 of these things in a process won't notice.