On Wed, Nov 10, 2021 at 12:40:50PM +0100, Patrick Steinhardt wrote: > Please note that I didn't yet add any performance numbers or tests. > Performance tests didn't show any conclusive results on my machine given > that I couldn't observe any noticeable impact at all, and I didn't write > tests yet given that I first wanted to get some feedback on this series. > If we agree that this is the correct way to go forward I'll of course > put in some more time to add them. Here's a mild adjustment of the quick test I showed earlier in [1]. It uses your version for all cases, but swaps between the three config options, and it uses --atomic to put everything into a single transaction: $ hyperfine -L v false,true,batch -p ./setup 'git -C dst.git config core.fsyncreffiles {v}; git.compile push --atomic dst.git refs/heads/*' Benchmark 1: git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/* Time (mean ± σ): 10.5 ms ± 0.2 ms [User: 7.8 ms, System: 3.9 ms] Range (min … max): 10.1 ms … 11.0 ms 108 runs Benchmark 2: git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/* Time (mean ± σ): 402.5 ms ± 6.2 ms [User: 13.9 ms, System: 10.7 ms] Range (min … max): 387.3 ms … 408.9 ms 10 runs Benchmark 3: git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/* Time (mean ± σ): 21.4 ms ± 0.6 ms [User: 8.0 ms, System: 5.2 ms] Range (min … max): 20.7 ms … 24.9 ms 99 runs Summary 'git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/*' ran 2.05 ± 0.07 times faster than 'git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/*' 38.51 ± 1.06 times faster than 'git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/*' So yes, the "batch" case is much improved. It's still more expensive than skipping the fsync entirely, but it's within reason. The trick, though is the --atomic. If I drop that flag, then "batch" takes as long as "true". And of course that's the default. I wonder if that means we'd want an extra mode: do the WRITEOUT_ONLY flush, but _don't_ do a HARDWARE_FLUSH for each transaction. That doesn't give you durability, but it (in theory) solves the out-of-order journal problem without paying any performance cost. I say "in theory" because I'm just assuming that the WRITEOUT thing works as advertised. I don't have an easy way to test the outcome on a power failure at the right moment here. -Peff [1] https://lore.kernel.org/git/YYTaiIlEKxHRVdCy@xxxxxxxxxxxxxxxxxxxxxxx/