Re: [PATCH v7 3/3] mm: add a field to store names for private anonymous memory

Dave Hansen <dave.hansen@xxxxxxxxx> · Thu, 3 Sep 2020 11:40:06 -0700

On 9/3/20 11:26 AM, Colin Cross wrote:
>> FWIW, I don't have any objections to this.
>>
>> Refcounting strings was what I think I had the strongest reaction to
>> back in the good old days of 2013.  strdup() on split plus strcmp() on
>> merge doesn't sound afwul to me, and it is darn straightforward.  The
>> biggest downside is probably kernel memory consumption.  We should
>> probably just think through whether having so many duplicates changes
>> things materially.
>>
>> For instance, should/could we penalize a task's vm.max_map_count when
>> it's using this mechanism?
> Just to provide some concrete numbers, the ART process I examined
> (https://pastebin.com/YNUTvZyz) had 280 named anonymous mappings using
> a total of 6566 bytes for the names.  There were only 63 unique names,
> using 1925 bytes.  On my personal usage device, there are currently a
> total of 59769 named anonymous devices across all processes using
> 1224119 bytes, 5843 of them unique using 121754 bytes.  The vast
> majority of the unique names are of the form "stack_and_tls:999",
> which are dynamically allocated in the userspace process' heap.  There
> are only 132 names that do not contain stack_and_tls using 9540 bytes,
> repeated 49938 times using 1030651 bytes (108x).  Most of those are
> constant strings, meaning the pointer is into the .rodata section of a
> file mapping that is shared between all processes.

Thanks for the data!  That seems like totally reasonable memory
consumption in the normal cases.

I'm mostly concerned about the worst-case kernel memory consumption.
It's a DoS if there's a big change to the amount of kernel memory a
process can consume.  For instance, each VMA is 192 bytes.  If the name
limit was, say, 4k, it makes the per-VMA kernel memory consumption go up
by a factor of 20, which isn't nice.  But, if it's, say 16 bytes, it's
only a 10% increase over what each VMA consumes today.

> Is fork a concern?  It would have to strdup every name.

Should be pretty easy to confirm the worst case experimentally.  Create
a process with ~64k VMAs, time the fork()s.  Then do the same with 64k
VMAs with the longest names set on each VMA.  If you can't time a
difference, you'll have a strong argument that someone with 10 of these
things in a process won't notice.