Michael, there are no replies, but I still think it is better we apply the following patch to man-pages. Thanks. ---- 8< ---- From: Kirill Smelkov <kirr@xxxxxxxxxx> Subject: [patch] mmap.2: Add link to commit which broke MAP_POPULATE | MAP_NONBLOCK to be noop Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxx> --- man2/mmap.2 | 1 + 1 file changed, 1 insertion(+) diff --git a/man2/mmap.2 b/man2/mmap.2 index 96875e486..f6fd56523 100644 --- a/man2/mmap.2 +++ b/man2/mmap.2 @@ -300,6 +300,7 @@ Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes +.\" commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 .BR MAP_POPULATE to do nothing. One day, the combination of -- 2.11.0 ---- 8< ---- On Mon, Mar 20, 2017 at 11:06:44PM +0300, Kirill Smelkov wrote: > Michael, first of all thanks for feedback. > > On Mon, Mar 20, 2017 at 08:38:50PM +0100, Michael Kerrisk (man-pages) wrote: > > [CC += Michel Lespinasse <walken@xxxxxxxxxx>] > > > > Kirill, > > > > I need some help here. > > > > On 20 March 2017 at 16:59, Kirill Smelkov <kirr@xxxxxxxxxx> wrote: > > > On Sat, Mar 18, 2017 at 10:40:10PM +0300, Kirill Smelkov wrote: > > >> Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxx> > > >> --- > > >> man2/mmap.2 | 1 + > > >> 1 file changed, 1 insertion(+) > > >> > > >> diff --git a/man2/mmap.2 b/man2/mmap.2 > > >> index 96875e486..f6fd56523 100644 > > >> --- a/man2/mmap.2 > > >> +++ b/man2/mmap.2 > > >> @@ -300,6 +300,7 @@ Don't perform read-ahead: > > >> create page tables entries only for pages > > >> that are already present in RAM. > > >> Since Linux 2.6.23, this flag causes > > >> +.\" commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 > > >> .BR MAP_POPULATE > > >> to do nothing. > > >> One day, the combination of > > > > > > Please also find below benchmark which explains why > > > > > > mmap(MAP_POPULATE | MAP_NONBLOCK) > > > > > > is actually needed. > > > > Okay -- clearly things have changed (but I received no man-pages > > patch). > > Strange it was sent. Let me show it once again here (git am -s): > > ---- 8< ---- > From: Kirill Smelkov <kirr@xxxxxxxxxx> > Subject: [patch] mmap.2: Add link to commit which broke MAP_POPULATE | MAP_NONBLOCK to be noop > > Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxx> > --- > man2/mmap.2 | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/man2/mmap.2 b/man2/mmap.2 > index 96875e486..f6fd56523 100644 > --- a/man2/mmap.2 > +++ b/man2/mmap.2 > @@ -300,6 +300,7 @@ Don't perform read-ahead: > create page tables entries only for pages > that are already present in RAM. > Since Linux 2.6.23, this flag causes > +.\" commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 > .BR MAP_POPULATE > to do nothing. > One day, the combination of > -- > 2.11.0 > ---- 8< ---- > > > > What do you believe the man page should now say. > > What man page says today correctly describes current behaviour: > > ---- 8< ---- > MAP_NONBLOCK (since Linux 2.5.46) > This flag is meaningful only in conjunction with MAP_POPULATE. Don't perform read- > ahead: create page tables entries only for pages that are already present in RAM. Since > Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combination of > MAP_POPULATE and MAP_NONBLOCK may be reimplemented. > ---- 8< ---- > > For now I've just added reference to commit corresponding to "Since Linux > 2.6.23, this flag causes MAP_POPULATE to do nothing." > > > > Or, perhaps we can ask Michel: > > > > commit bebeb3d68b24bb4132d452c5707fe321208bcbcd > > Author: Michel Lespinasse <walken@xxxxxxxxxx> > > Date: Fri Feb 22 16:32:37 2013 -0800 > > > > The above commit (which went into Linux 3.9) seems to be the source of > > the change. > > > > Michael, can you suggest to us what the mmap() man page should now say > > about MAP_POPULATE? > > It is good to have feedback from relevant people, but as my patch to > man-pages says, if I understand it correctly, the original patch which > changed behaviour is this: > > ---- 8< ---- > commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 > Author: Nick Piggin <npiggin@xxxxxxx> > Date: Thu Jul 19 01:46:59 2007 -0700 > > mm: merge populate and nopage into fault (fixes nonlinear) > > ... > > After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in <-- NOTE here > pagecache. Seems like a fringe functionality anyway. > > ... > > [akpm@xxxxxxxxxxxxxxxxxxxx: cleanup] > [randy.dunlap@xxxxxxxxxx: doc. fixes for readahead] > [akpm@xxxxxxxxxxxxxxxxxxxx: build fix] > Signed-off-by: Nick Piggin <npiggin@xxxxxxx> > Signed-off-by: Randy Dunlap <randy.dunlap@xxxxxxxxxx> > Cc: Mark Fasheh <mark.fasheh@xxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > ---- 8< ---- > > Adding all people involved to Cc - please have a look at quoted benchmark below > which justifies usage of mmap(MAP_POPULATE | MAP_NONBLOCK). > > Thanks, > Kirill > > > > > ---- 8< ---- (https://lab.nexedi.com/kirr/misc/blob/5a25f4ae/t_sysmmap_c.c) > > > /* This program benchmarks pagefault time. > > > * > > > * Unfortunately as of 2017-Mar-20 for data in pagecache the situation is as > > > * follows (i7-6600U, Linux 4.9.13): > > > * > > > * 1. minor pagefault: ~ 1200ns > > > * (this program) > > > * > > > * 2. read syscall + whole page copy: ~ 215ns > > > * (https://github.com/golang/go/issues/19563#issuecomment-287423654) > > > * > > > * 3. it is not possible to mmap(MAP_POPULATE | MAP_NONBLOCK) (i.e. prefault > > > * those PTE that are already in pagecache). > > > * ( http://www.spinics.net/lists/linux-man/msg11420.html, > > > * https://git.kernel.org/linus/54cb8821de07f2ffcd28c380ce9b93d5784b40d7 ) > > > * > > > * 4. (Q) I'm not sure a mechanism exists in the kernel to automatically > > > * subscribe a VMA so that when a page becomes pagecached, associated PTE is > > > * adjusted so that programs won't need to pay minor pagefault time on > > > * access. > > > * > > > * unless 3 and 4 are solved mmap unfortunately seems to be slower choice > > > * compared to just pread. > > > */ > > > #define _GNU_SOURCE > > > #include <sys/types.h> > > > #include <sys/stat.h> > > > #include <fcntl.h> > > > #include <unistd.h> > > > #include <stdio.h> > > > #include <stdlib.h> > > > #include <sys/time.h> > > > #include <sys/user.h> > > > #include <sys/mman.h> > > > > > > // 12345678 > > > #define NITER 500000 > > > > > > // microtime returns current time as double > > > double microtime() { > > > int err; > > > struct timeval tv; > > > > > > err = gettimeofday(&tv, NULL); > > > if (err == -1) { > > > perror("gettimeofday"); > > > abort(); > > > } > > > > > > return tv.tv_sec + 1E-6 * tv.tv_usec; > > > } > > > > > > > > > int main() { > > > unsigned char *addr, sum = 0; > > > int fd, err, i; > > > size_t size; > > > double Tstart, Tend; > > > > > > fd = open("/dev/shm/y.dat", O_RDWR | O_CREAT | O_TRUNC, 0666); > > > if (fd == -1) { > > > perror("open"); > > > abort(); > > > } > > > > > > size = NITER * PAGE_SIZE; > > > > > > err = ftruncate(fd, size); > > > if (err == -1) { > > > perror("ftruncate"); > > > abort(); > > > } > > > > > > #if 1 > > > // make sure RAM is actually allocated > > > Tstart = microtime(); > > > err = fallocate(fd, /*mode*/0, 0, size); > > > Tend = microtime(); > > > if (err == -1) { > > > perror("fallocate"); > > > abort(); > > > } > > > printf("T(fallocate):\t%.1f\t%6.1f ns / page\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER); > > > #endif > > > > > > Tstart = microtime(); > > > addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > > > //addr = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE, fd, 0); > > > //addr = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE | MAP_NONBLOCK, fd, 0); > > > if (addr == MAP_FAILED) { > > > perror("mmap"); > > > abort(); > > > } > > > Tend = microtime(); > > > printf("T(mmap):\t%.1f\t%6.1f ns / page\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER); > > > > > > Tstart = microtime(); > > > //for (int j=0; j < 100; j++) > > > for (i=0; i<NITER; i++) { > > > sum += addr[i*PAGE_SIZE]; > > > } > > > Tend = microtime(); > > > > > > printf("T(pagefault):\t%.1f\t%6.1f ns / page\t(%i)\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER, sum); > > > > > > return 0; > > > } > > > ---- 8< ---- -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html