[PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Git for Windows has had fsyncing of object files enabled since "409cae91eb
(mingw: change core.fsyncObjectFiles = 1 by default, 2017-09-04)".

There have been requests to make core.fsyncObjectFiles the default
everywhere, but there are concerns about its performance cost (perf results
below). There's a long and gory thread here:
https://lore.kernel.org/git/87a7xcw8sa.fsf@xxxxxxxxxxxxxx/t/.

My change introduces the new 'core.fsyncobjectFiles = 2' setting, which
batches the data-integrity FLUSH command sent to the disk across multiple
loose object files added to the object database.

We take advantage of the bulk-checkin hooks already in the add command and
add some hooks to the update-index (which is used internally by stash).
Details are in the last patch of the series.

Here's a simple performance test script:

    #!/bin/sh
    git clone https://github.com/nodejs/node.git node-repo-cache
    git clone node-repo-cache node-repo
    cd node-repo
    git --version
    
    find . -name "*.c" -exec sh -c 'echo foo1 >> $1' -- {} \;
    echo "----GIT stash fsync"
    time git -c core.fsyncObjectFiles=true stash push
    
    find . -name "*.c" -exec sh -c 'echo foo2 >> $1' -- {} \;
    echo "----GIT stash fsync_defer"
    time git -c core.fsyncObjectFiles=2 stash push
    
    find . -name "*.c" -exec sh -c 'echo foo3 >> $1' -- {} \;
    echo "----GIT stash no_fsync"
    time git -c core.fsyncObjectFiles=false stash push
    
    cd ..
    rm -r -f node-repo


Hardware:

 * Mac - Mac Mini 2018 running MacOS 11.5.1, APFS with a 1TB Apple NMVE SSD,
 * Linux - Ubuntu 20.04 - ext4 running on a Hyper-V VM with a fixed VHDX
   backed by a Samsung PM981.
 * Win - Windows NTFS - Same Hyper-V host as Linux. Operation | Mac | Linux
   | Windows

---------------- |---------|-------|---------- git fsync | 40.6 s | 7.8 s |
6.9s git fsync_defer | 6.5 s | 2.1 s | 3.8s git no_fsync | 1.7 s | 1.0 s |
2.6s

The windows version of git is slightly different:
https://github.com/git-for-windows/git/pull/3391. I also used a
Windows-specific test script.

I hope I'm CC'ing a reasonable set of people on this patch, based on the
last discussion.

Thanks, Neeraj Singh Windows Core File Systems.

Neeraj Singh (2):
  object-file: use futimes rather than utime
  core.fsyncobjectfiles: batch disk flushes

 Documentation/config/core.txt |  17 ++++--
 Makefile                      |   4 ++
 builtin/add.c                 |   3 +-
 builtin/update-index.c        |   3 +
 bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
 bulk-checkin.h                |   4 +-
 compat/mingw.c                |  42 +++++++++-----
 compat/mingw.h                |   2 +
 config.c                      |   4 +-
 config.mak.uname              |   2 +
 configure.ac                  |   8 +++
 git-compat-util.h             |   7 +++
 object-file.c                 |  23 ++------
 wrapper.c                     |  36 ++++++++++++
 write-or-die.c                |   2 +-
 15 files changed, 213 insertions(+), 49 deletions(-)


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v1
Pull-Request: https://github.com/git/git/pull/1076
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux