On Mon, Feb 15, 2021 at 11:22:20PM -0600, Steve French wrote: > On Mon, Feb 15, 2021 at 8:10 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > The switch from readpages to readahead does help in a couple of corner > > cases. For example, if you have two processes reading the same file at > > the same time, one will now block on the other (due to the page lock) > > rather than submitting a mess of overlapping and partial reads. > > Do you have a simple repro example of this we could try (fio, dbench, iozone > etc) to get some objective perf data? I don't. The problem was noted by the f2fs people, so maybe they have a reproducer. > My biggest worry is making sure that the switch to netfs doesn't degrade > performance (which might be a low bar now since current network file copy > perf seems to signifcantly lag at least Windows), and in some easy to understand > scenarios want to make sure it actually helps perf. I had a question about that ... you've mentioned having 4x4MB reads outstanding as being the way to get optimum performance. Is there a significant performance difference between 4x4MB, 16x1MB and 64x256kB? I'm concerned about having "too large" an I/O on the wire at a given time. For example, with a 1Gbps link, you get 250MB/s. That's a minimum latency of 16us for a 4kB page, but 16ms for a 4MB page. "For very simple tasks, people can perceive latencies down to 2 ms or less" (https://danluu.com/input-lag/) so going all the way to 4MB I/Os takes us into the perceptible latency range, whereas a 256kB I/O is only 1ms. So could you do some experiments with fio doing direct I/O to see if it takes significantly longer to do, say, 1TB of I/O in 4MB chunks vs 256kB chunks? Obviously use threads to keep lots of I/Os outstanding.