On Tue, Sep 25, 2018 at 12:22:47AM +0300, Kirill A. Shutemov wrote: > External Email > > On Mon, Sep 24, 2018 at 04:08:52PM +0300, Yury Norov wrote: > > After mlock() on newly mmap()ed shared memory I observe page faults. > > > > The problem is that populate_vma_page_range() doesn't set FOLL_WRITE > > flag for writable shared memory in mlock() path, arguing that like: > > /* > > * We want to touch writable mappings with a write fault in order > > * to break COW, except for shared mappings because these don't COW > > * and we would not want to dirty them for nothing. > > */ > > > > But they are actually COWed. The most straightforward way to avoid it > > is to set FOLL_WRITE flag for shared mappings as well as for private ones. > > Huh? How do shared mapping get CoWed? > > In this context CoW means to create a private copy of the page for the > process. It only makes sense for private mappings as all pages in shared > mappings do not belong to the process. > > Shared mappings will still get faults, but a bit later -- after the page > is written back to disc, the page get clear and write protected to catch > the next write access. > > Noticeable exception is tmpfs/shmem. These pages do not belong to normal > write back process. But the code path is used for other filesystems as > well. > > Therefore, NAK. You only create unneeded write back traffic. Hi Kirill, (My first reaction was exactly like yours indeed, but) on my real system (Cavium OcteonTX2), and on my qemu simulation I can reproduce the same behavior: just mlock()ed memory causes faults. That faults happen because page is mapped to the process as read-only, while underlying VMA is read-write. So faults get resolved well by just setting write access to the page. Maybe I use term COW wrongly here, but this is how faultin_page() works, and it sets FOLL_COW bit before return (which is ignored on upper level). I realize that proper fix may be more complex, and if so I'll thankfully take it and drop this patch from my tree, but this is all that I have so far to address the problem. The user code below is reproducer. Thanks, Yury int i, ret, len = getpagesize() * 1000; char tmpfile[] = "/tmp/my_tmp-XXXXXX"; int fd = mkstemp(tmpfile); ret = ftruncate(fd, len); if (ret) { printf("Failed to ftruncate: %d\n", errno); goto out; } ptr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (ptr == MAP_FAILED) { printf("Failed to mmap memory: %d\n", errno); goto out; } ret = mlock(ptr, len); if (ret) { printf("Failed to mlock: %d\n", errno); goto out; } printf("Touch...\n"); for (i = 0; i < len; i++) ptr[i] = (char) i; /* Faults here. */ printf("\t... done\n"); out: close(fd); unlink(tmpfile);