On Tue, May 28, 2019 at 12:15:16PM +0300, Kirill Tkhai wrote: > On 28.05.2019 02:30, Kirill A. Shutemov wrote: > > On Fri, May 24, 2019 at 05:00:32PM +0300, Kirill Tkhai wrote: > >> On 24.05.2019 14:52, Kirill A. Shutemov wrote: > >>> On Fri, May 24, 2019 at 01:45:50PM +0300, Kirill Tkhai wrote: > >>>> On 22.05.2019 18:22, Kirill A. Shutemov wrote: > >>>>> On Mon, May 20, 2019 at 05:00:01PM +0300, Kirill Tkhai wrote: > >>>>>> This patchset adds a new syscall, which makes possible > >>>>>> to clone a VMA from a process to current process. > >>>>>> The syscall supplements the functionality provided > >>>>>> by process_vm_writev() and process_vm_readv() syscalls, > >>>>>> and it may be useful in many situation. > >>>>> > >>>>> Kirill, could you explain how the change affects rmap and how it is safe. > >>>>> > >>>>> My concern is that the patchset allows to map the same page multiple times > >>>>> within one process or even map page allocated by child to the parrent. > >>>>> > >>>>> It was not allowed before. > >>>>> > >>>>> In the best case it makes reasoning about rmap substantially more difficult. > >>>>> > >>>>> But I'm worry it will introduce hard-to-debug bugs, like described in > >>>>> https://lwn.net/Articles/383162/. > >>>> > >>>> Andy suggested to unmap PTEs from source page table, and this make the single > >>>> page never be mapped in the same process twice. This is OK for my use case, > >>>> and here we will just do a small step "allow to inherit VMA by a child process", > >>>> which we didn't have before this. If someone still needs to continue the work > >>>> to allow the same page be mapped twice in a single process in the future, this > >>>> person will have a supported basis we do in this small step. I believe, someone > >>>> like debugger may want to have this to make a fast snapshot of a process private > >>>> memory (when the task is stopped for a small time to get its memory). But for > >>>> me remapping is enough at the moment. > >>>> > >>>> What do you think about this? > >>> > >>> I don't think that unmapping alone will do. Consider the following > >>> scenario: > >>> > >>> 1. Task A creates and populates the mapping. > >>> 2. Task A forks. We have now Task B mapping the same pages, but > >>> write-protected. > >>> 3. Task B calls process_vm_mmap() and passes the mapping to the parent. > >>> > >>> After this Task A will have the same anon pages mapped twice. > >> > >> Ah, sure. > >> > >>> One possible way out would be to force CoW on all pages in the mapping, > >>> before passing the mapping to the new process. > >> > >> This will pop all swapped pages up, which is the thing the patchset aims > >> to prevent. > >> > >> Hm, what about allow remapping only VMA, which anon_vma::rb_root contain > >> only chain and which vma->anon_vma_chain contains single entry? This is > >> a vma, which were faulted, but its mm never were duplicated (or which > >> forks already died). > > > > The requirement for the VMA to be faulted (have any pages mapped) looks > > excessive to me, but the general idea may work. > > > > One issue I see is that userspace may not have full control to create such > > VMA. vma_merge() can merge the VMA to the next one without any consent > > from userspace and you'll get anon_vma inherited from the VMA you've > > justed merged with. > > > > I don't have any valid idea on how to get around this. > > Technically it is possible by creating boundary 1-page VMAs with another protection: > one above and one below the desired region, then map the desired mapping. But this > is not comfortable. > > I don't think it's difficult to find a natural limitation, which prevents mapping > a single page twice if we want to avoid this at least on start. Another suggestion: > > prohibit to map a remote process's VMA only in case of its vm_area_struct::anon_vma::root > is the same as root of one of local process's VMA. > > What about this? I don't see anything immediately wrong with this, but it's still going to produce puzzling errors for a user. How would you document such limitation in the way it makes sense for userspace developer? -- Kirill A. Shutemov