On 14.10.21 22:16, Suren Baghdasaryan wrote: > On Tue, Oct 12, 2021 at 10:01 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: >> >> On Tue, Oct 12, 2021 at 12:44 AM David Hildenbrand <david@xxxxxxxxxx> wrote: >>> >>>> I'm still evaluating the proposal to use memfds but I'm not sure if >>>> the issue that David Hildenbrand mentioned about additional memory >>>> consumed in pagecache (which has to be addressed) is the only one we >>>> will encounter with this approach. If anyone knows of any potential >>>> issues with using memfds as named anonymous memory, I would really >>>> appreciate your feedback before I go too far in that direction. >>> >>> [MAP_PRIVATE memfd only behave that way with 4k, not with huge pages, so >>> I think it just has to be fixed. It doesn't make any sense to allocate a >>> page for the pagecache ("populate the file") when accessing via a >>> private mapping that's supposed to leave the file untouched] >>> >>> My gut feeling is if you really need a string as identifier, then try >>> going with memfds. Yes, we might hit some road blocks to be sorted out, >>> but it just logically makes sense to me: Files have names. These names >>> exist before mapping and after mapping. They "name" the content. >> >> I'm investigating this direction. I don't have much background with >> memfds, so I'll need to digest the code first. > > I've done some investigation into the possibility of using memfds to > name anonymous VMAs. Here are my findings: Thanks for exploring the alternatives! > > 1. Forking a process with anonymous vmas named using memfd is 5-15% > slower than with prctl (depends on the number of VMAs in the process > being forked). Profiling shows that i_mmap_lock_write() dominates > dup_mmap(). Exit path is also slower by roughly 9% with > free_pgtables() and fput() dominating exit_mmap(). Fork performance is > important for Android because almost all processes are forked from > zygote, therefore this limitation already makes this approach > prohibitive. Interesting, naturally I wonder if that can be optimized. > > 2. mremap() usage to grow the mapping has an issue when used with memfds: > > fd = memfd_create(name, MFD_ALLOW_SEALING); > ftruncate(fd, size_bytes); > ptr = mmap(NULL, size_bytes, prot, MAP_PRIVATE, fd, 0); > close(fd); > ptr = mremap(ptr, size_bytes, size_bytes * 2, MREMAP_MAYMOVE); > touch_mem(ptr, size_bytes * 2); > > This would generate a SIGBUS in touch_mem(). I believe it's because > ftruncate() specified the size to be size_bytes and we are accessing > more than that after remapping. prctl() does not have this limitation > and we do have a usecase for growing a named VMA. Can't you simply size the memfd much larger? I mean, it doesn't really cost much, does it? > > 3. Leaves an fd exposed, even briefly, which may lead to unexpected > flaws (e.g. anything using mmap MAP_SHARED could allow exposures or > overwrites). Even MAP_PRIVATE, if an attacker writes into the file > after ftruncate() and before mmap(), can cause private memory to be > initialized with unexpected data. I don't quite follow. Can you elaborate what exactly the issue here is? We use a temporary fd, yes, but how is that a problem? Any attacker can just write any random memory memory in the address space, so I don't see the issue. > > 4. There is a usecase in the Android userspace where vma naming > happens after memory was allocated. Bionic linker does in-memory > relocations and then names some relocated sections. Would renaming a memfd be an option or is that "too late" ? > > In the light of these findings, could the current patchset be reconsidered? > Thanks, > Suren. > -- Thanks, David / dhildenb