On Mon, Sep 14, 2015 at 01:06:25PM -0700, Linus Torvalds wrote: > On Sun, Sep 13, 2015 at 4:12 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > Really need to run these numbers on slower disks where block layer > > merging makes a difference to performance. > > Yeah. We've seen plugging and io schedulers not make much difference > for high-performance flash (although I think the people who argued > that noop should generally be used for non-rotating media were wrong, > I think - the elevator ends up still being critical to merging, and > while merging isn't a life-or-death situation, it tends to still > help). Yeah, my big concern was that holding the plug longer would result in lower overall perf because we weren't keeping the flash busy. So I started with the flash boxes to make sure we weren't regressing past 4.2 levels at least. I'm still worried about that, but this probably isn't the right benchmark to show it. And if it's really a problem, it'll happen everywhere we plug and not just here. > > For rotating rust with nasty seek times, the plugging is likely to > make the biggest difference. For rotating storage, I grabbed a big box and did the fs_mark run against 8 spindles. These are all behind a megaraid card as jbods, so I flipped the card's cache to write-through. I changed around the run a bit, making enough files for fs_mark to run for ~10 minutes, and I took out the sync. I ran only xfs to cut down on the iterations, and after the fs_mark run, I did short 30 second run with blktrace in the background to capture the io sizes. v4.2: 178K files/sec Chinner: 192K files/sec Mason: 192K files/sec Linus: 193K files/sec I added support to iowatcher to graph IO size, and attached the graph. Short version, Linus' patch still gives bigger IOs and similar perf to Dave's original. I should have done the blktrace runs for 60 seconds instead of 30, I suspect that would even out the average sizes between the three patches. -chris
Attachment:
fs_mark.png
Description: Binary data