Thanks for clarifying. I have modified the mmap test program (see attached) to optionally read in the entire file when the WORKAROUND= environment variable is set, thereby preventing the FUSE reads in the write phase. I can now see a batch of reads, followed by a batch of writes. What’s interesting: when polling using “while :; do grep ^Bdi /sys/kernel/debug/bdi/0:93/stats; sleep 0.1; done” and running the mmap test program, I see: BdiDirtied: 3566304 kB BdiWritten: 3563616 kB BdiWriteBandwidth: 13596 kBps BdiDirtied: 3566304 kB BdiWritten: 3563616 kB BdiWriteBandwidth: 13596 kBps BdiDirtied: 3566528 kB (+224 kB) <-- starting to dirty pages BdiWritten: 3564064 kB (+448 kB) <-- starting to write BdiWriteBandwidth: 10700 kBps <-- only bandwidth update! BdiDirtied: 3668224 kB (+ 101696 kB) <-- all pages dirtied BdiWritten: 3565632 kB (+1568 kB) BdiWriteBandwidth: 10700 kBps BdiDirtied: 3668224 kB BdiWritten: 3665536 kB (+ 99904 kB) <-- all pages written BdiWriteBandwidth: 10700 kBps BdiDirtied: 3668224 kB BdiWritten: 3665536 kB BdiWriteBandwidth: 10700 kBps This seems to suggest that the bandwidth measurements only capture the rising slope of the transfer, but not the bulk of the transfer itself, resulting in inaccurate measurements. This effect is worsened when the test program doesn’t pre-read the output file and hence the kernel gets fewer FUSE write requests out. On Mon, Mar 9, 2020 at 3:36 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Mon, Mar 9, 2020 at 3:32 PM Michael Stapelberg > <michael+lkml@xxxxxxxxxxxxx> wrote: > > > > Here’s one more thing I noticed: when polling > > /sys/kernel/debug/bdi/0:93/stats, I see that BdiDirtied and BdiWritten > > remain at their original values while the kernel sends FUSE read > > requests, and only goes up when the kernel transitions into sending > > FUSE write requests. Notably, the page dirtying throttling happens in > > the read phase, which is most likely why the write bandwidth is > > (correctly) measured as 0. > > > > Do we have any ideas on why the kernel sends FUSE reads at all? > > Memory writes (stores) need the memory page to be up-to-date wrt. the > backing file before proceeding. This means that if the page hasn't > yet been cached by the kernel, it needs to be read first. > > Thanks, > Miklos
#include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <stdint.h> /* * An implementation of copy ("cp") that uses memory maps. Various * error checking has been removed to promote readability */ // Where we want the source file's memory map to live in virtual memory // The destination file resides immediately after the source file #define MAP_LOCATION 0x6100 int main (int argc, char *argv[]) { int fdin, fdout; char *src, *dst; struct stat statbuf; off_t fileSize = 0; if (argc != 3) { printf ("usage: a.out <fromfile> <tofile>\n"); exit(0); } /* open the input file */ if ((fdin = open (argv[1], O_RDONLY)) < 0) { printf ("can't open %s for reading\n", argv[1]); exit(0); } /* open/create the output file */ if ((fdout = open (argv[2], O_RDWR | O_CREAT | O_TRUNC, 0600)) < 0) { printf ("can't create %s for writing\n", argv[2]); exit(0); } /* find size of input file */ fstat (fdin,&statbuf) ; fileSize = statbuf.st_size; /* go to the location corresponding to the last byte */ if (lseek (fdout, fileSize - 1, SEEK_SET) == -1) { printf ("lseek error\n"); exit(0); } /* write a dummy byte at the last location */ write (fdout, "", 1); /* * memory map the input file. Only the first two arguments are * interesting: 1) the location and 2) the size of the memory map * in virtual memory space. Note that the location is only a "hint"; * the OS can choose to return a different virtual memory address. * This is illustrated by the printf command below. */ src = mmap ((void*) MAP_LOCATION, fileSize, PROT_READ, MAP_SHARED | MAP_POPULATE, fdin, 0); /* memory map the output file after the input file */ dst = mmap ((void*) MAP_LOCATION + fileSize , fileSize , PROT_READ | PROT_WRITE, MAP_SHARED, fdout, 0); printf("pid: %d\n", getpid()); printf("Mapped src: 0x%p and dst: 0x%p\n",src,dst); if (getenv("WORKAROUND") != NULL) { printf("workaround: reading output file before dirtying its pages\n"); uint8_t sum = 0; uint8_t *ptr = (uint8_t*)dst; for (off_t i = 0; i < fileSize; i++) { sum += *ptr; ptr++; } printf("sum: %d\n", sum); sleep(1); printf("writing\n"); } /* Copy the input file to the output file */ memcpy (dst, src, fileSize); printf("memcpy done\n"); // we should probably unmap memory and close the files } /* main */