Thanks to everyone for review so far! Changes since v3: * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now accept no value to mean "true" and we require 'batch' to be lowercase. * Leave the default fsync mode as 'false'. Git for windows can change its default when this series makes it over to that fork. * Use a switch statement in git_fsync, as suggested by Junio. * Add regression test cases for core.fsyncobjectfiles=batch. This should keep the batch functionality basically working in upstream git even if few users adopt batch mode initially. I expect git-for-windows will provide a good baking area for the new mode. Changes since v2: * Removed an unused Makefile define (FSYNC_DOESNT_FLUSH) that slipped in from an intermediate change. * Drop the futimens part of the patch and return to just calling utime, now within the new bulk_checkin code. The utime to futimens change seemed to be problematic for some platforms (thanks Randall Becker), and is really orthogonal to the rest of the patch series. * (Optional commit) Enable batch mode by default so that we can shake loose any issues relating to deferring the renames until the unplug_bulk_checkin. Changes since v1: * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary to dscho's suggestion, I'm still implementing the Windows version in the same patch and I'm not doing autoconf detection since this is a POSIX function. * Introduce a separate preparatory patch to the bulk-checkin infrastructure to separate the 'plugged' variable and rename the 'state' variable, as suggested by dscho. * Add performance numbers to the commit message of the main bulk fsync patch, as suggested by dscho. * Add a comment about the non-thread-safety of the bulk-checkin infrastructure, as suggested by avarab. * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested by dscho and avarab and others. * Add more details to Documentation/config/core.txt about the various settings and their intended effects, as suggested by avarab. * Switch to the string-list API to hold the rename state, as suggested by avarab. * Create a separate update-index patch to use bulk-checkin as suggested by dscho. * Add Windows support in the upstream git. This is done in a way that should not conflict with git-for-windows. * Add new performance tests that shows the delta based on fsync mode. NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct on Linux, since sync_file_range does not provide data integrity guarantees. There is currently no kernel interface suitable to achieve disk flush batching as is, but he suggested that he might implement a 'syncfs' variant on top of this patchset. This code is still useful on macOS and Windows, and the config documentation makes that clear. Neeraj Singh (6): bulk-checkin: rename 'state' variable and separate 'plugged' boolean core.fsyncobjectfiles: batched disk flushes core.fsyncobjectfiles: add windows support for batch mode update-index: use the bulk-checkin infrastructure core.fsyncobjectfiles: tests for batch mode core.fsyncobjectfiles: performance tests for add and stash Documentation/config/core.txt | 26 +++++-- Makefile | 6 ++ builtin/add.c | 3 +- builtin/update-index.c | 3 + bulk-checkin.c | 103 +++++++++++++++++++++++++--- bulk-checkin.h | 5 +- cache.h | 8 ++- compat/mingw.h | 3 + compat/win32/flush.c | 29 ++++++++ config.c | 7 +- config.mak.uname | 3 + configure.ac | 8 +++ contrib/buildsystems/CMakeLists.txt | 3 +- environment.c | 2 +- git-compat-util.h | 7 ++ object-file.c | 22 +----- t/lib-unique-files.sh | 34 +++++++++ t/perf/p3700-add.sh | 43 ++++++++++++ t/perf/p3900-stash.sh | 46 +++++++++++++ t/t3700-add.sh | 11 +++ t/t3903-stash.sh | 14 ++++ wrapper.c | 48 +++++++++++++ write-or-die.c | 2 +- 23 files changed, 392 insertions(+), 44 deletions(-) create mode 100644 compat/win32/flush.c create mode 100644 t/lib-unique-files.sh create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v4 Pull-Request: https://github.com/git/git/pull/1076 Range-diff vs v3: 1: d5893e28df1 = 1: d5893e28df1 bulk-checkin: rename 'state' variable and separate 'plugged' boolean 2: f8b5b709e9e ! 2: 12cad737635 core.fsyncobjectfiles: batched disk flushes @@ config.c: static int git_default_core_config(const char *var, const char *value, if (!strcmp(var, "core.fsyncobjectfiles")) { - fsync_object_files = git_config_bool(var, value); -+ if (!value) -+ return config_error_nonbool(var); -+ if (!strcasecmp(value, "batch")) ++ if (value && !strcmp(value, "batch")) + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; ++ else if (git_config_bool(var, value)) ++ fsync_object_files = FSYNC_OBJECT_FILES_ON; + else -+ fsync_object_files = git_config_bool(var, value) -+ ? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF; ++ fsync_object_files = FSYNC_OBJECT_FILES_OFF; return 0; } @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode) +int git_fsync(int fd, enum fsync_action action) +{ -+ if (action == FSYNC_WRITEOUT_ONLY) { ++ switch (action) { ++ case FSYNC_WRITEOUT_ONLY: ++ +#ifdef __APPLE__ + /* -+ * on Mac OS X, fsync just causes filesystem cache writeback but does not ++ * on macOS, fsync just causes filesystem cache writeback but does not + * flush hardware caches. + */ + return fsync(fd); @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode) + + errno = ENOSYS; + return -1; -+ } ++ ++ case FSYNC_HARDWARE_FLUSH: + +#ifdef __APPLE__ -+ return fcntl(fd, F_FULLFSYNC); ++ return fcntl(fd, F_FULLFSYNC); +#else -+ return fsync(fd); ++ return fsync(fd); +#endif ++ ++ default: ++ BUG("unexpected git_fsync(%d) call", action); ++ } ++ +} + static int warn_if_unremovable(const char *op, const char *file, int rc) 3: 815a862e229 ! 3: a5b3e21b762 core.fsyncobjectfiles: add windows support for batch mode @@ wrapper.c: int git_fsync(int fd, enum fsync_action action) + errno = ENOSYS; return -1; - } + 4: 6b576038986 = 4: f7f756f3932 update-index: use the bulk-checkin infrastructure -: ----------- > 5: afb0028e796 core.fsyncobjectfiles: tests for batch mode 5: b7ca3ba9302 ! 6: 3e6b80b5fa2 core.fsyncobjectfiles: performance tests for add and stash @@ Commit message Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> - ## t/perf/lib-unique-files.sh (new) ## -@@ -+# Helper to create files with unique contents -+ -+test_create_unique_files_base__=$(date -u) -+test_create_unique_files_counter__=0 -+ -+# Create multiple files with unique contents. Takes the number of -+# directories, the number of files in each directory, and the base -+# directory. -+# -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files -+# each in the current directory, all -+# with unique contents. -+ -+test_create_unique_files() { -+ test "$#" -ne 3 && BUG "3 param" -+ -+ local dirs=$1 -+ local files=$2 -+ local basedir=$3 -+ -+ for i in $(test_seq $dirs) -+ do -+ local dir=$basedir/dir$i -+ -+ mkdir -p "$dir" > /dev/null -+ for j in $(test_seq $files) -+ do -+ test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1)) -+ echo "$test_create_unique_files_base__.$test_create_unique_files_counter__" >"$dir/file$j.txt" -+ done -+ done -+} - ## t/perf/p3700-add.sh (new) ## @@ +#!/bin/sh @@ t/perf/p3700-add.sh (new) + +. ./perf-lib.sh + -+. $TEST_DIRECTORY/perf/lib-unique-files.sh ++. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree @@ t/perf/p3700-add.sh (new) +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1 ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 @@ t/perf/p3900-stash.sh (new) + +. ./perf-lib.sh + -+. $TEST_DIRECTORY/perf/lib-unique-files.sh ++. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree @@ t/perf/p3900-stash.sh (new) +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1 ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 6: 55a40fc8fd5 < -: ----------- core.fsyncobjectfiles: enable batch mode for testing -- gitgitgadget