On Thu, Sep 3, 2020 at 11:09 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 9/3/20 11:00 AM, Kees Cook wrote: > > Why is a kernel-copied string insufficient for this? I don't think VMA > > merging is a fast-path operation, so doing a strcmp isn't going to wreck > > anything... > > > > Let me try to find the earlier thread with Dave Hansen... okay, found it: > > https://lore.kernel.org/linux-mm/51DDF071.5000309@xxxxxxxxx/ > > > > Right, so, this idea predates userfaultfd. :) > > > > More notes below, but I *really* think this should not be a userspace > > pointer. And since a separate union has been found, let's just do a > > strndup_user() for the name, validate it as containing only printable > > characters without \n \r \v \f and move the merging logic into a > > separate patch. > > FWIW, I don't have any objections to this. > > Refcounting strings was what I think I had the strongest reaction to > back in the good old days of 2013. strdup() on split plus strcmp() on > merge doesn't sound afwul to me, and it is darn straightforward. The > biggest downside is probably kernel memory consumption. We should > probably just think through whether having so many duplicates changes > things materially. > > For instance, should/could we penalize a task's vm.max_map_count when > it's using this mechanism? Just to provide some concrete numbers, the ART process I examined (https://pastebin.com/YNUTvZyz) had 280 named anonymous mappings using a total of 6566 bytes for the names. There were only 63 unique names, using 1925 bytes. On my personal usage device, there are currently a total of 59769 named anonymous devices across all processes using 1224119 bytes, 5843 of them unique using 121754 bytes. The vast majority of the unique names are of the form "stack_and_tls:999", which are dynamically allocated in the userspace process' heap. There are only 132 names that do not contain stack_and_tls using 9540 bytes, repeated 49938 times using 1030651 bytes (108x). Most of those are constant strings, meaning the pointer is into the .rodata section of a file mapping that is shared between all processes. Is fork a concern? It would have to strdup every name.