From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> GGG closed this series erroneously, so I'm trying out git-send-email. Apologies for any mistakes. This series is also available at https://github.com/neerajsi-msft/git/git.git ns/batched-fsync-v6. V6 changes: * Based on master at faa21c1 to pick up ns/fsync-or-die-message-fix. Also resolved a conflict with 8aa0209 in t/perf/p7519-fsmonitor.sh. * Some independent patches were submitted separately on-list. This series is now dependent on ns/fsync-or-die-message-fix. * Rename bulk_checkin_state to bulk_checkin_packfile to discourage future authors from adding any non-packfile related stuff to it. Each individual component of bulk_checkin should have its own state variable(s) going forward, and they should only be tied together by odb_transaction_nesting. * Rename finish_bulk_checkin and do_batch_fsync to flush_bulk_checkin and flush_batch_fsync. The "finish" step is going to be the end_odb_transaction. The "flush" terminology should be consistently used for making changes visible. * Add flush_odb_transaction and use it in update-index before printing verbose output to mitigate risk of missing objects for a tricky stdin feeder. * Re-add shell "local with assignment": now these are all on a separate line with quotes around any values, to comply with dash. I'm running on ubuntu 20.04 LTS where I saw some of the dash issues before. Neeraj Singh (12): bulk-checkin: rename 'state' variable and separate 'plugged' boolean bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' core.fsyncmethod: batched disk flushes for loose-objects cache-tree: use ODB transaction around writing a tree builtin/add: add ODB transaction around add_files_to_cache update-index: use the bulk-checkin infrastructure unpack-objects: use the bulk-checkin infrastructure core.fsync: use batch mode and sync loose objects by default on Windows test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsyncmethod: tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: performance tests for batch mode Documentation/config/core.txt | 8 ++ builtin/add.c | 13 ++- builtin/unpack-objects.c | 3 + builtin/update-index.c | 20 +++++ bulk-checkin.c | 117 +++++++++++++++++++++---- bulk-checkin.h | 27 +++++- cache-tree.c | 3 + cache.h | 12 ++- compat/mingw.h | 3 + config.c | 4 +- git-compat-util.h | 2 + object-file.c | 7 +- t/lib-unique-files.sh | 34 +++++++ t/perf/p0008-odb-fsync.sh | 82 +++++++++++++++++ t/perf/p4220-log-grep-engines.sh | 3 +- t/perf/p4221-log-grep-engines-fixed.sh | 3 +- t/perf/p5302-pack-index.sh | 15 ++-- t/perf/p7519-fsmonitor.sh | 18 +--- t/perf/p7820-grep-engines.sh | 6 +- t/perf/perf-lib.sh | 63 +++++++++++-- t/t3700-add.sh | 28 ++++++ t/t3903-stash.sh | 20 +++++ t/t5300-pack-object.sh | 41 ++++++--- t/t5317-pack-objects-filter-objects.sh | 91 ++++++++++--------- t/test-lib-functions.sh | 10 +++ 25 files changed, 513 insertions(+), 120 deletions(-) create mode 100644 t/lib-unique-files.sh create mode 100755 t/perf/p0008-odb-fsync.sh Range-diff against v5: 1: c7a2a7efe6d < -: ----------- bulk-checkin: rename 'state' variable and separate 'plugged' boolean -: ----------- > 1: adabdaa0290 bulk-checkin: rename 'state' variable and separate 'plugged' boolean 2: d045b13795b ! 2: 72a6cd36c9c bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' @@ Commit message writing code can optimize their operations without caring whether the top-level code has a transaction active. + Add a flush_odb_transaction API that will be used in update-index to + make objects visible even if a transaction is active. The flush call may + also be useful in future cases if we hold a transaction active around + calling hooks. + Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> ## builtin/add.c ## @@ bulk-checkin.c -static int bulk_checkin_plugged; +static int odb_transaction_nesting; - static struct bulk_checkin_state { + static struct bulk_checkin_packfile { char *pack_tmp_name; @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid, { - int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_packfile, oid, fd, size, type, path, flags); - if (!bulk_checkin_plugged) + if (!odb_transaction_nesting) - finish_bulk_checkin(&bulk_checkin_state); + flush_bulk_checkin_packfile(&bulk_checkin_packfile); return status; } @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid, } -void unplug_bulk_checkin(void) -+void end_odb_transaction(void) ++void flush_odb_transaction(void) { - assert(bulk_checkin_plugged); - bulk_checkin_plugged = 0; + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + } ++ ++void end_odb_transaction(void) ++{ + odb_transaction_nesting -= 1; + if (odb_transaction_nesting < 0) + BUG("Unbalanced ODB transaction nesting"); @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid, + if (odb_transaction_nesting) + return; + - if (bulk_checkin_state.f) - finish_bulk_checkin(&bulk_checkin_state); - } ++ flush_odb_transaction(); ++} ## bulk-checkin.h ## @@ bulk-checkin.h: int index_bulk_checkin(struct object_id *oid, @@ bulk-checkin.h: int index_bulk_checkin(struct object_id *oid, +/* + * Tell the object database to optimize for adding + * multiple objects. end_odb_transaction must be called -+ * to make new objects visible. ++ * to make new objects visible. Transactions can be nested, ++ * and objects are only visible after the outermost transaction ++ * is complete or the transaction is flushed. + */ +void begin_odb_transaction(void); + +/* ++ * Make any objects that are currently part of a pending object ++ * database transaction visible. It is valid to call this function ++ * even if no transaction is active. ++ */ ++void flush_odb_transaction(void); ++ ++/* + * Tell the object database to make any objects from the -+ * current transaction visible. ++ * current transaction visible if this is the final nested ++ * transaction. + */ +void end_odb_transaction(void); 3: 2d1bc4568ac < -: ----------- object-file: pass filename to fsync_or_die 4: 9e7ae22fa4a ! 3: 57539f104ef core.fsyncmethod: batched disk flushes for loose-objects @@ Documentation/config/core.txt: core.fsyncMethod:: + updates in the disk writeback cache and then does a single full fsync of + a dummy file to trigger the disk cache flush at the end of the operation. ++ -+ Currently `batch` mode only applies to loose-object files. Other repository -+ data is made durable as if `fsync` was specified. This mode is expected to -+ be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems -+ and on Windows for repos stored on NTFS or ReFS filesystems. ++Currently `batch` mode only applies to loose-object files. Other repository ++data is made durable as if `fsync` was specified. This mode is expected to ++be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems ++and on Windows for repos stored on NTFS or ReFS filesystems. core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. @@ bulk-checkin.c +static struct tmp_objdir *bulk_fsync_objdir; + - static struct bulk_checkin_state { + static struct bulk_checkin_packfile { char *pack_tmp_name; struct hashfile *f; -@@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state) +@@ bulk-checkin.c: static void flush_bulk_checkin_packfile(struct bulk_checkin_packfile *state) reprepare_packed_git(the_repository); } +/* + * Cleanup after batch-mode fsync_object_files. + */ -+static void do_batch_fsync(void) ++static void flush_batch_fsync(void) +{ + struct strbuf temp_path = STRBUF_INIT; + struct tempfile *temp; @@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state + bulk_fsync_objdir = NULL; +} + - static int already_written(struct bulk_checkin_state *state, struct object_id *oid) + static int already_written(struct bulk_checkin_packfile *state, struct object_id *oid) { int i; -@@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state, +@@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_packfile *state, return 0; } @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state, + * If we have an active ODB transaction, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush -+ * before as part of do_batch_fsync. ++ * before renaming the objects to their final names as part of ++ * flush_batch_fsync. + */ + if (!bulk_fsync_objdir || + git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) { @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state, int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) -@@ bulk-checkin.c: void end_odb_transaction(void) +@@ bulk-checkin.c: void begin_odb_transaction(void) - if (bulk_checkin_state.f) - finish_bulk_checkin(&bulk_checkin_state); -+ -+ do_batch_fsync(); + void flush_odb_transaction(void) + { ++ flush_batch_fsync(); + flush_bulk_checkin_packfile(&bulk_checkin_packfile); } + ## bulk-checkin.h ## @@ 5: 83fa4a5f3a5 = 4: f47631e6a28 cache-tree: use ODB transaction around writing a tree 6: d514842ad49 = 5: 08c9b234942 builtin/add: add ODB transaction around add_files_to_cache 7: 8cac94598a5 ! 6: bc37cdbd226 update-index: use the bulk-checkin infrastructure @@ Commit message There is some risk with this change, since under batch fsync, the object files will be in a tmp-objdir until update-index is complete, so callers using the --stdin option will not see them until update-index is done. - This risk is mitigated by not keeping an ODB transaction open around - --stdin processing if in --verbose mode. Without --verbose mode, - a caller feeding update-index via --stdin wouldn't know when - update-index adds an object, event without an ODB transaction. + This risk is mitigated by flushing the ODB transaction prior to + reporting any verbose output so that objects will be visible to callers + that are synchronizing with update-index by snooping its output. Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> @@ builtin/update-index.c #include "config.h" #include "lockfile.h" #include "quote.h" +@@ builtin/update-index.c: static void report(const char *fmt, ...) + if (!verbose) + return; + ++ /* ++ * It is possible, though unlikely, that a caller could use the verbose ++ * output to synchronize with addition of objects to the object ++ * database. The current implementation of ODB transactions leaves ++ * objects invisible while a transaction is active, so flush the ++ * transaction here before reporting a change made by update-index. ++ */ ++ flush_odb_transaction(); + va_start(vp, fmt); + vprintf(fmt, vp); + putchar('\n'); @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) */ parse_options_start(&ctx, argc, argv, prefix, @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const while (ctx.argc) { if (parseopt_state != PARSE_OPT_DONE) parseopt_state = parse_options_step(&ctx, options, -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) - the_index.version = preferred_index_format; - } - -+ /* -+ * It is possible, though unlikely, that a caller could use the verbose -+ * output to synchronize with addition of objects to the object -+ * database. The current implementation of ODB transactions leaves -+ * objects invisible while a transaction is active, so end the -+ * transaction here if verbose output is enabled. -+ */ -+ -+ if (verbose) -+ end_odb_transaction(); -+ - if (read_from_stdin) { - struct strbuf buf = STRBUF_INIT; - struct strbuf unquoted = STRBUF_INIT; @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const + /* + * By now we have added all of the new objects + */ -+ if (!verbose) -+ end_odb_transaction(); ++ end_odb_transaction(); + if (split_index > 0) { if (git_config_get_split_index() == 0) 8: 523e5fbd63e = 7: 9cf584cdb67 unpack-objects: use the bulk-checkin infrastructure 9: faacc19aab2 = 8: 5039b596064 core.fsync: use batch mode and sync loose objects by default on Windows 10: 4de7300a7b0 = 9: 67205b3ac25 test-lib-functions: add parsing helpers for ls-files and ls-tree 11: 1a4aff8c350 ! 10: 148e562ddb8 core.fsyncmethod: tests for batch mode @@ t/lib-unique-files.sh (new) + local dirs="$1" && + local files="$2" && + local basedir="$3" && -+ local counter=0 && ++ local counter="0" && + local i && + local j && + test_tick && -+ local basedata=$basedir$test_tick && ++ local basedata="$basedir$test_tick" && + rm -rf "$basedir" && + for i in $(test_seq $dirs) + do -+ local dir=$basedir/dir$i && ++ local dir="$basedir/dir$i" && + mkdir -p "$dir" && + for j in $(test_seq $files) + do 12: 47cc63e1dda ! 11: 1a8320828f7 t/perf: add iteration setup mechanism to perf-lib @@ t/perf/p7519-fsmonitor.sh: then - fi -fi - - trace_start() { + trace_start () { if test -n "$GIT_PERF_7519_TRACE" then -@@ t/perf/p7519-fsmonitor.sh: setup_for_fsmonitor() { +@@ t/perf/p7519-fsmonitor.sh: setup_for_fsmonitor_hook () { test_perf_w_drop_caches () { if test -n "$GIT_PERF_7519_DROP_CACHE"; then @@ t/perf/p7519-fsmonitor.sh: setup_for_fsmonitor() { - test_perf "$@" } - test_fsmonitor_suite() { + test_fsmonitor_suite () { ## t/perf/p7820-grep-engines.sh ## @@ t/perf/p7820-grep-engines.sh: do @@ t/perf/perf-lib.sh: exit $ret' >&3 2>&4 } test_wrapper_ () { -+ local test_wrapper_func_ test_title_ - test_wrapper_func_=$1; shift -+ test_title_=$1; shift +- test_wrapper_func_=$1; shift ++ local test_wrapper_func_="$1"; shift ++ local test_title_="$1"; shift test_start_ - test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= - test "$#" = 2 || 13: 26be6ecb28b ! 12: 3eb5a6720cb core.fsyncmethod: performance tests for batch mode @@ t/perf/p0008-odb-fsync.sh (new) +} + +test_perf_fsync_cfgs () { -+ local method cfg && ++ local method && ++ local cfg && + for method in none fsync batch writeout-only + do + case $method in 14: 88c1f84d4c3 < -: ----------- core.fsyncmethod: correctly camel-case warning message -- 2.34.1.78.g86e39b8f8d