Re: [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 24, 2021 at 1:12 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> Thanks to everyone for review so far! Changes since v4, all in response to
> review feedback from Ævar Arnfjörð Bjarmason:
>
>  * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
>    to add a statement about not fsyncing parent directories.
>
>    * I still don't want to make any promises on behalf of the Linux FS developers
>      in the documentation. However, according to [v4.1] and my understanding
>      of how XFS journals are documented to work, it looks like recent versions
>      of Linux running on XFS should be as safe as Windows or macOS in 'batch'
>      mode. I don't know about ext4, since it's not clear to me when metadata
>      updates are made visible to the journal.
>
>
>  * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
>    pointed out, this lets us access the added loose objects immediately,
>    rather than only after unplugging the bulk checkin. This is a hard
>    requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
>
>    * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
>      tmp objdir (as determined by the quarantine environment variable).
>
>    * I added support to the tmp-objdir lib to replace the 'main' writable odb.
>
>    * Instead of using a lockfile for the final full fsync, we now use a new dummy
>      temp file. Doing that makes the below unpack-objects change easier.
>
>
>  * Add bulk-checkin support to unpack-objects, which is used in fetch and
>    push. In addition to making those operations faster, it allows us to
>    directly compare performance of packfiles against loose objects. Please
>    see [v4.2] for a measurement of 'git push' to a local upstream with
>    different numbers of unique new files.
>
>  * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.
>
>  * Remove comment with link to NtFlushBuffersFileEx documentation.
>
>  * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
>    contents, but now this uses test_tick, so it should be deterministic from
>    run to run.
>
>  * Ensure there are tests for all of the modified commands. Make the
>    unpack-objects tests validate that the unpacked objects are really
>    available in the ODB.
>
> References for v4: [v4.1]
> https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@xxxxxxxxx/#t
>
> [v4.2]
> https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117
>
> Changes since v3:
>
>  * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
>    accept no value to mean "true" and we require 'batch' to be lowercase.
>
>  * Leave the default fsync mode as 'false'. Git for windows can change its
>    default when this series makes it over to that fork.
>
>  * Use a switch statement in git_fsync, as suggested by Junio.
>
>  * Add regression test cases for core.fsyncobjectfiles=batch. This should
>    keep the batch functionality basically working in upstream git even if
>    few users adopt batch mode initially. I expect git-for-windows will
>    provide a good baking area for the new mode.
>
> Neeraj Singh (7):
>   object-file.c: do not rename in a temp odb
>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>   core.fsyncobjectfiles: batched disk flushes
>   update-index: use the bulk-checkin infrastructure
>   unpack-objects: use the bulk-checkin infrastructure
>   core.fsyncobjectfiles: tests for batch mode
>   core.fsyncobjectfiles: performance tests for add and stash
>
>  Documentation/config/core.txt |  29 +++++++--
>  Makefile                      |   6 ++
>  builtin/add.c                 |   1 +
>  builtin/unpack-objects.c      |   3 +
>  builtin/update-index.c        |   6 ++
>  bulk-checkin.c                |  92 +++++++++++++++++++++++---
>  bulk-checkin.h                |   2 +
>  cache.h                       |   8 ++-
>  config.c                      |   7 +-
>  config.mak.uname              |   1 +
>  configure.ac                  |   8 +++
>  environment.c                 |   6 +-
>  git-compat-util.h             |   7 ++
>  object-file.c                 | 118 +++++++++++++++++++++++++++++-----
>  object-store.h                |  22 +++++++
>  object.c                      |   2 +-
>  repository.c                  |   2 +
>  repository.h                  |   1 +
>  t/lib-unique-files.sh         |  36 +++++++++++
>  t/perf/p3700-add.sh           |  43 +++++++++++++
>  t/perf/p3900-stash.sh         |  46 +++++++++++++
>  t/t3700-add.sh                |  20 ++++++
>  t/t3903-stash.sh              |  14 ++++
>  t/t5300-pack-object.sh        |  30 +++++----
>  tmp-objdir.c                  |  20 +++++-
>  tmp-objdir.h                  |   6 ++
>  wrapper.c                     |  44 +++++++++++++
>  write-or-die.c                |   2 +-
>  28 files changed, 532 insertions(+), 50 deletions(-)
>  create mode 100644 t/lib-unique-files.sh
>  create mode 100755 t/perf/p3700-add.sh
>  create mode 100755 t/perf/p3900-stash.sh
>
>
> base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v5
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v5
> Pull-Request: https://github.com/git/git/pull/1076
>
> Range-diff vs v4:
>
>  -:  ----------- > 1:  95315f35a28 object-file.c: do not rename in a temp odb
>  1:  d5893e28df1 = 2:  df6fab94d67 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>  2:  12cad737635 ! 3:  fe19cdfc930 core.fsyncobjectfiles: batched disk flushes
>      @@ Commit message
>
>           One major source of the cost of fsync is the implied flush of the
>           hardware writeback cache within the disk drive. Fortunately, Windows,
>      -    macOS, and Linux each offer mechanisms to write data from the filesystem
>      -    page cache without initiating a hardware flush.
>      +    and macOS offer mechanisms to write data from the filesystem page cache
>      +    without initiating a hardware flush. Linux has the sync_file_range API,
>      +    which issues a pagecache writeback request reliably after version 5.2.
>
>           This patch introduces a new 'core.fsyncObjectFiles = batch' option that
>      -    takes advantage of the bulk-checkin infrastructure to batch up hardware
>      -    flushes.
>      +    batches up hardware flushes. It hooks into the bulk-checkin plugging and
>      +    unplugging functionality and takes advantage of tmp-objdir.
>
>      -    When the new mode is enabled we do the following for new objects:
>      -
>      -    1. Create a tmp_obj_XXXX file and write the object data to it.
>      +    When the new mode is enabled we do the following for each new object:
>      +    1. Create the object in a tmp-objdir.
>           2. Issue a pagecache writeback request and wait for it to complete.
>      -    3. Record the tmp name and the final name in the bulk-checkin state for
>      -       later rename.
>
>      -    At the end of the entire transaction we:
>      -    1. Issue a fsync against the lock file to flush the hardware writeback
>      -       cache, which should by now have processed the tmp file writes.
>      -    2. Rename all of the temp files to their final names.
>      +    At the end of the entire transaction when unplugging bulk checkin we:
>      +    1. Issue an fsync against a dummy file to flush the hardware writeback
>      +       cache, which should by now have processed the tmp-objdir writes.
>      +    2. Rename all of the tmp-objdir files to their final names.
>           3. When updating the index and/or refs, we assume that Git will issue
>      -       another fsync internal to that operation.
>      +       another fsync internal to that operation. This is not the case today,
>      +       but may be a good extension to those components.
>
>           On a filesystem with a singular journal that is updated during name
>      -    operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
>      +    operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
>           would expect the fsync to trigger a journal writeout so that this
>           sequence is enough to ensure that the user's data is durable by the time
>           the git command returns.
>      @@ Documentation/config/core.txt: core.whitespace::
>       + A value indicating the level of effort Git will expend in
>       + trying to make objects added to the repo durable in the event
>       + of an unclean system shutdown. This setting currently only
>      -+ controls the object store, so updates to any refs or the
>      -+ index may not be equally durable.
>      ++ controls loose objects in the object store, so updates to any
>      ++ refs or the index may not be equally durable.
>       ++
>       +* `false` allows data to remain in file system caches according to
>       +  operating system policy, whence it may be lost if the system loses power
>       +  or crashes.
>      -+* `true` triggers a data integrity flush for each object added to the
>      ++* `true` triggers a data integrity flush for each loose object added to the
>       +  object store. This is the safest setting that is likely to ensure durability
>       +  across all operating systems and file systems that honor the 'fsync' system
>       +  call. However, this setting comes with a significant performance cost on
>      -+  common hardware.
>      ++  common hardware. Git does not currently fsync parent directories for
>      ++  newly-added files, so some filesystems may still allow data to be lost on
>      ++  system crash.
>       +* `batch` enables an experimental mode that uses interfaces available in some
>      -+  operating systems to write object data with a minimal set of FLUSH CACHE
>      -+  (or equivalent) commands sent to the storage controller. If the operating
>      -+  system interfaces are not available, this mode behaves the same as `true`.
>      -+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
>      -+  filesystems and on Windows for repos stored on NTFS or ReFS.
>      ++  operating systems to write loose object data with a minimal set of FLUSH
>      ++  CACHE (or equivalent) commands sent to the storage controller. If the
>      ++  operating system interfaces are not available, this mode behaves the same as
>      ++  `true`. This mode is expected to be as safe as `true` on macOS for repos
>      ++  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
>      ++  ReFS.
>
>        core.preloadIndex::
>         Enable parallel index preload for operations like 'git diff'
>      @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
>
>         if (chmod_arg && pathspec.nr)
>                 exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
>      -- unplug_bulk_checkin();
>       +
>      -+ unplug_bulk_checkin(&lock_file);
>      +  unplug_bulk_checkin();
>
>        finish:
>      -  if (write_locked_index(&the_index, &lock_file,
>
>        ## bulk-checkin.c ##
>       @@
>      @@ bulk-checkin.c
>        #include "pack.h"
>        #include "strbuf.h"
>       +#include "string-list.h"
>      ++#include "tmp-objdir.h"
>        #include "packfile.h"
>        #include "object-store.h"
>
>        static int bulk_checkin_plugged;
>      -
>      -+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
>      ++static int needs_batch_fsync;
>       +
>      ++static struct tmp_objdir *bulk_fsync_objdir;
>      +
>        static struct bulk_checkin_state {
>         char *pack_tmp_name;
>      -  struct hashfile *f;
>       @@ bulk-checkin.c: clear_exit:
>         reprepare_packed_git(the_repository);
>        }
>
>      -+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
>      ++/*
>      ++ * Cleanup after batch-mode fsync_object_files.
>      ++ */
>      ++static void do_batch_fsync(void)
>       +{
>      -+ if (fsync_state->nr) {
>      -+         struct string_list_item *rename;
>      -+
>      -+         /*
>      -+          * Issue a full hardware flush against the lock file to ensure
>      -+          * that all objects are durable before any renames occur.
>      -+          * The code in fsync_and_close_loose_object_bulk_checkin has
>      -+          * already ensured that writeout has occurred, but it has not
>      -+          * flushed any writeback cache in the storage hardware.
>      -+          */
>      -+         fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
>      -+
>      -+         for_each_string_list_item(rename, fsync_state) {
>      -+                 const char *src = rename->string;
>      -+                 const char *dst = rename->util;
>      -+
>      -+                 if (finalize_object_file(src, dst))
>      -+                         die_errno(_("could not rename '%s' to '%s'"), src, dst);
>      -+         }
>      -+
>      -+         string_list_clear(fsync_state, 1);
>      ++ /*
>      ++  * Issue a full hardware flush against a temporary file to ensure
>      ++  * that all objects are durable before any renames occur.  The code in
>      ++  * fsync_loose_object_bulk_checkin has already issued a writeout
>      ++  * request, but it has not flushed any writeback cache in the storage
>      ++  * hardware.
>      ++  */
>      ++
>      ++ if (needs_batch_fsync) {
>      ++         struct strbuf temp_path = STRBUF_INIT;
>      ++         struct tempfile *temp;
>      ++
>      ++         strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
>      ++         temp = xmks_tempfile(temp_path.buf);
>      ++         fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
>      ++         delete_tempfile(&temp);
>      ++         strbuf_release(&temp_path);
>       + }
>      ++
>      ++ if (bulk_fsync_objdir)
>      ++         tmp_objdir_migrate(bulk_fsync_objdir);
>       +}
>       +
>        static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
>      @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
>         return 0;
>        }
>
>      -+static void add_rename_bulk_checkin(struct string_list *fsync_state,
>      -+                             const char *src, const char *dst)
>      ++void fsync_loose_object_bulk_checkin(int fd)
>       +{
>      -+ string_list_insert(fsync_state, src)->util = xstrdup(dst);
>      -+}
>      -+
>      -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
>      -+                                       const char *filename, time_t mtime)
>      -+{
>      -+ int do_finalize = 1;
>      -+ int ret = 0;
>      -+
>      -+ if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
>      -+         /*
>      -+          * If we have a plugged bulk checkin, we issue a call that
>      -+          * cleans the filesystem page cache but avoids a hardware flush
>      -+          * command. Later on we will issue a single hardware flush
>      -+          * before renaming files as part of do_sync_and_rename.
>      -+          */
>      -+         if (bulk_checkin_plugged &&
>      -+             fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
>      -+             git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
>      -+                 add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
>      -+                 do_finalize = 0;
>      -+
>      -+         } else {
>      -+                 fsync_or_die(fd, "loose object file");
>      -+         }
>      -+ }
>      -+
>      -+ if (close(fd))
>      -+         die_errno(_("error when closing loose object file"));
>      -+
>      -+ if (mtime) {
>      -+         struct utimbuf utb;
>      -+         utb.actime = mtime;
>      -+         utb.modtime = mtime;
>      -+         if (utime(tmpfile, &utb) < 0)
>      -+                 warning_errno(_("failed utime() on %s"), tmpfile);
>      ++ assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
>      ++
>      ++ /*
>      ++  * If we have a plugged bulk checkin, we issue a call that
>      ++  * cleans the filesystem page cache but avoids a hardware flush
>      ++  * command. Later on we will issue a single hardware flush
>      ++  * before as part of do_batch_fsync.
>      ++  */
>      ++ if (bulk_checkin_plugged &&
>      ++     git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
>      ++         assert(the_repository->objects->odb->is_temp);
>      ++         if (!needs_batch_fsync)
>      ++                 needs_batch_fsync = 1;
>      ++ } else {
>      ++         fsync_or_die(fd, "loose object file");
>       + }
>      -+
>      -+ if (do_finalize)
>      -+         ret = finalize_object_file(tmpfile, filename);
>      -+
>      -+ return ret;
>       +}
>       +
>        int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags)
>      -@@ bulk-checkin.c: void plug_bulk_checkin(void)
>      +@@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
>      + void plug_bulk_checkin(void)
>      + {
>      +  assert(!bulk_checkin_plugged);
>      ++
>      ++ /*
>      ++  * Create a temporary object directory if the current
>      ++  * object directory is not already temporary.
>      ++  */
>      ++ if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
>      ++     !the_repository->objects->odb->is_temp) {
>      ++         bulk_fsync_objdir = tmp_objdir_create();
>      ++         if (!bulk_fsync_objdir)
>      ++                 die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
>      ++
>      ++         tmp_objdir_replace_main_odb(bulk_fsync_objdir);
>      ++ }
>      ++
>         bulk_checkin_plugged = 1;
>        }
>
>      --void unplug_bulk_checkin(void)
>      -+void unplug_bulk_checkin(struct lock_file *lock_file)
>      - {
>      -  assert(bulk_checkin_plugged);
>      +@@ bulk-checkin.c: void unplug_bulk_checkin(void)
>         bulk_checkin_plugged = 0;
>         if (bulk_checkin_state.f)
>                 finish_bulk_checkin(&bulk_checkin_state);
>       +
>      -+ do_sync_and_rename(&bulk_fsync_state, lock_file);
>      ++ do_batch_fsync();
>        }
>
>        ## bulk-checkin.h ##
>      @@ bulk-checkin.h
>
>        #include "cache.h"
>
>      -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
>      -+                                       const char *filename, time_t mtime);
>      ++void fsync_loose_object_bulk_checkin(int fd);
>       +
>        int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags);
>      -
>      - void plug_bulk_checkin(void);
>      --void unplug_bulk_checkin(void);
>      -+void unplug_bulk_checkin(struct lock_file *);
>      -
>      - #endif
>
>        ## cache.h ##
>       @@ cache.h: void reset_shared_repository(void);
>      @@ cache.h: void reset_shared_repository(void);
>        extern char *git_replace_ref_base;
>
>       -extern int fsync_object_files;
>      -+enum FSYNC_OBJECT_FILES_MODE {
>      ++enum fsync_object_files_mode {
>       +    FSYNC_OBJECT_FILES_OFF,
>       +    FSYNC_OBJECT_FILES_ON,
>       +    FSYNC_OBJECT_FILES_BATCH
>       +};
>       +
>      -+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>      ++extern enum fsync_object_files_mode fsync_object_files;
>        extern int core_preload_index;
>        extern int precomposed_unicode;
>        extern int protect_hfs;
>      @@ environment.c: const char *git_hooks_path;
>        int core_compression_level;
>        int pack_compression_level = Z_DEFAULT_COMPRESSION;
>       -int fsync_object_files;
>      -+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>      ++enum fsync_object_files_mode fsync_object_files;
>        size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>        size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>        size_t delta_base_cache_limit = 96 * 1024 * 1024;
>      @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
>         * Returns 0 on success, which includes trying to unlink an object that does
>
>        ## object-file.c ##
>      -@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>      -  return 0;
>      +@@ object-file.c: void add_to_alternates_memory(const char *reference)
>      +                       '\n', NULL, 0);
>        }
>
>      --/* Finalize a file on disk, and close it. */
>      --static void close_loose_object(int fd)
>      --{
>      ++struct object_directory *set_temporary_main_odb(const char *dir)
>      ++{
>      ++ struct object_directory *main_odb, *new_odb, *old_next;
>      ++
>      ++ /*
>      ++  * Make sure alternates are initialized, or else our entry may be
>      ++  * overwritten when they are.
>      ++  */
>      ++ prepare_alt_odb(the_repository);
>      ++
>      ++ /* Copy the existing object directory and make it an alternate. */
>      ++ main_odb = the_repository->objects->odb;
>      ++ new_odb = xmalloc(sizeof(*new_odb));
>      ++ *new_odb = *main_odb;
>      ++ *the_repository->objects->odb_tail = new_odb;
>      ++ the_repository->objects->odb_tail = &(new_odb->next);
>      ++ new_odb->next = NULL;
>      ++
>      ++ /*
>      ++  * Reinitialize the main odb with the specified path, being careful
>      ++  * to keep the next pointer value.
>      ++  */
>      ++ old_next = main_odb->next;
>      ++ memset(main_odb, 0, sizeof(*main_odb));
>      ++ main_odb->next = old_next;
>      ++ main_odb->is_temp = 1;
>      ++ main_odb->path = xstrdup(dir);
>      ++ return new_odb;
>      ++}
>      ++
>      ++void restore_main_odb(struct object_directory *odb)
>      ++{
>      ++ struct object_directory **prev, *main_odb;
>      ++
>      ++ /* Unlink the saved previous main ODB from the list. */
>      ++ prev = &the_repository->objects->odb->next;
>      ++ assert(*prev);
>      ++ while (*prev != odb) {
>      ++         prev = &(*prev)->next;
>      ++ }
>      ++ *prev = odb->next;
>      ++ if (*prev == NULL)
>      ++         the_repository->objects->odb_tail = prev;
>      ++
>      ++ /*
>      ++  * Restore the data from the old main odb, being careful to
>      ++  * keep the next pointer value
>      ++  */
>      ++ main_odb = the_repository->objects->odb;
>      ++ SWAP(*main_odb, *odb);
>      ++ main_odb->next = odb->next;
>      ++ free_object_directory(odb);
>      ++}
>      ++
>      + /*
>      +  * Compute the exact path an alternate is at and returns it. In case of
>      +  * error NULL is returned and the human readable error is added to `err`
>      +@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>      + /* Finalize a file on disk, and close it. */
>      + static void close_loose_object(int fd)
>      + {
>       - if (fsync_object_files)
>      --         fsync_or_die(fd, "loose object file");
>      -- if (close(fd) != 0)
>      --         die_errno(_("error when closing loose object file"));
>      --}
>      --
>      - /* Size of directory component, including the ending '/' */
>      - static inline int directory_size(const char *filename)
>      ++ switch (fsync_object_files) {
>      ++ case FSYNC_OBJECT_FILES_OFF:
>      ++         break;
>      ++ case FSYNC_OBJECT_FILES_ON:
>      +          fsync_or_die(fd, "loose object file");
>      ++         break;
>      ++ case FSYNC_OBJECT_FILES_BATCH:
>      ++         fsync_loose_object_bulk_checkin(fd);
>      ++         break;
>      ++ default:
>      ++         BUG("Invalid fsync_object_files mode.");
>      ++ }
>      ++
>      +  if (close(fd) != 0)
>      +          die_errno(_("error when closing loose object file"));
>      + }
>      +
>      + ## object-store.h ##
>      +@@ object-store.h: void add_to_alternates_file(const char *dir);
>      +  */
>      + void add_to_alternates_memory(const char *dir);
>      +
>      ++/*
>      ++ * Replace the current main object directory with the specified temporary
>      ++ * object directory. We make a copy of the former main object directory,
>      ++ * add it as an in-memory alternate, and return the copy so that it can
>      ++ * be restored via restore_main_odb.
>      ++ */
>      ++struct object_directory *set_temporary_main_odb(const char *dir);
>      ++
>      ++/*
>      ++ * Restore a previous ODB replaced by set_temporary_main_odb.
>      ++ */
>      ++void restore_main_odb(struct object_directory *odb);
>      ++
>      + /*
>      +  * Populate and return the loose object cache array corresponding to the
>      +  * given object ID.
>      +@@ object-store.h: struct oidtree *odb_loose_cache(struct object_directory *odb,
>      + /* Empty the loose object cache for the specified object directory. */
>      + void odb_clear_loose_cache(struct object_directory *odb);
>      +
>      ++/* Clear and free the specified object directory */
>      ++void free_object_directory(struct object_directory *odb);
>      ++
>      + struct packed_git {
>      +  struct hashmap_entry packmap_ent;
>      +  struct packed_git *next;
>      +
>      + ## object.c ##
>      +@@ object.c: struct raw_object_store *raw_object_store_new(void)
>      +  return o;
>      + }
>      +
>      +-static void free_object_directory(struct object_directory *odb)
>      ++void free_object_directory(struct object_directory *odb)
>        {
>      -@@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
>      -          die(_("confused by unstable object source data for %s"),
>      -              oid_to_hex(oid));
>      +  free(odb->path);
>      +  odb_clear_loose_cache(odb);
>      +
>      + ## tmp-objdir.c ##
>      +@@
>      + struct tmp_objdir {
>      +  struct strbuf path;
>      +  struct strvec env;
>      ++ struct object_directory *prev_main_odb;
>      + };
>
>      -- close_loose_object(fd);
>      --
>      -- if (mtime) {
>      --         struct utimbuf utb;
>      --         utb.actime = mtime;
>      --         utb.modtime = mtime;
>      --         if (utime(tmp_file.buf, &utb) < 0)
>      --                 warning_errno(_("failed utime() on %s"), tmp_file.buf);
>      -- }
>      --
>      -- return finalize_object_file(tmp_file.buf, filename.buf);
>      -+ return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
>      -+                                                  filename.buf, mtime);
>      + /*
>      +@@ tmp-objdir.c: static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>      +   * freeing memory; it may cause a deadlock if the signal
>      +   * arrived while libc's allocator lock is held.
>      +   */
>      +- if (!on_signal)
>      ++ if (!on_signal) {
>      ++         if (t->prev_main_odb)
>      ++                 restore_main_odb(t->prev_main_odb);
>      +          tmp_objdir_free(t);
>      ++ }
>      ++
>      +  return err;
>        }
>
>      - static int freshen_loose_object(const struct object_id *oid)
>      +@@ tmp-objdir.c: struct tmp_objdir *tmp_objdir_create(void)
>      +  t = xmalloc(sizeof(*t));
>      +  strbuf_init(&t->path, 0);
>      +  strvec_init(&t->env);
>      ++ t->prev_main_odb = NULL;
>      +
>      +  strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
>      +
>      +@@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
>      +  if (!t)
>      +          return 0;
>      +
>      ++ if (t->prev_main_odb) {
>      ++         restore_main_odb(t->prev_main_odb);
>      ++         t->prev_main_odb = NULL;
>      ++ }
>      ++
>      +  strbuf_addbuf(&src, &t->path);
>      +  strbuf_addstr(&dst, get_object_directory());
>      +
>      +@@ tmp-objdir.c: void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
>      + {
>      +  add_to_alternates_memory(t->path.buf);
>      + }
>      ++
>      ++void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
>      ++{
>      ++ if (t->prev_main_odb)
>      ++         BUG("the main object database is already replaced");
>      ++ t->prev_main_odb = set_temporary_main_odb(t->path.buf);
>      ++}
>      +
>      + ## tmp-objdir.h ##
>      +@@ tmp-objdir.h: int tmp_objdir_destroy(struct tmp_objdir *);
>      +  */
>      + void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
>      +
>      ++/*
>      ++ * Replaces the main object store in the current process with the temporary
>      ++ * object directory and makes the former main object store an alternate.
>      ++ */
>      ++void tmp_objdir_replace_main_odb(struct tmp_objdir *);
>      ++
>      + #endif /* TMP_OBJDIR_H */
>
>        ## wrapper.c ##
>       @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
>  3:  a5b3e21b762 < -:  ----------- core.fsyncobjectfiles: add windows support for batch mode
>  4:  f7f756f3932 ! 4:  485b4a767df update-index: use the bulk-checkin infrastructure
>      @@ Commit message
>           There is some risk with this change, since under batch fsync, the object
>           files will not be available until the update-index is entirely complete.
>           This usage is unlikely, since any tool invoking update-index and
>      -    expecting to see objects would have to snoop the output of --verbose to
>      -    find out when update-index has actually processed a given path.
>      -    Additionally the index is locked for the duration of the update.
>      +    expecting to see objects would have to synchronize with the update-index
>      +    process after passing it a file path.
>
>           Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
>
>      @@ builtin/update-index.c
>        #include "lockfile.h"
>        #include "quote.h"
>       @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
>      -          struct strbuf unquoted = STRBUF_INIT;
>
>      -          setup_work_tree();
>      -+         plug_bulk_checkin();
>      -          while (getline_fn(&buf, stdin) != EOF) {
>      -                  char *p;
>      -                  if (!nul_term_line && buf.buf[0] == '"') {
>      +  the_index.updated_skipworktree = 1;
>      +
>      ++ /* we might be adding many objects to the object database */
>      ++ plug_bulk_checkin();
>      ++
>      +  /*
>      +   * Custom copy of parse_options() because we want to handle
>      +   * filename arguments as they come.
>       @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
>      -                          chmod_path(set_executable_bit, p);
>      -                  free(p);
>      -          }
>      -+         unplug_bulk_checkin(&lock_file);
>      -          strbuf_release(&unquoted);
>                 strbuf_release(&buf);
>         }
>      +
>      ++ /* by now we must have added all of the new objects */
>      ++ unplug_bulk_checkin();
>      +  if (split_index > 0) {
>      +          if (git_config_get_split_index() == 0)
>      +                  warning(_("core.splitIndex is set to false; "
>  -:  ----------- > 5:  889e7668760 unpack-objects: use the bulk-checkin infrastructure
>  5:  afb0028e796 ! 6:  0f2e3b25759 core.fsyncobjectfiles: tests for batch mode
>      @@ Metadata
>        ## Commit message ##
>           core.fsyncobjectfiles: tests for batch mode
>
>      -    Add test cases to exercise batch mode for 'git add'
>      -    and 'git stash'. These tests ensure that the added
>      -    data winds up in the object database.
>      +    Add test cases to exercise batch mode for:
>      +     * 'git add'
>      +     * 'git stash'
>      +     * 'git update-index'
>      +     * 'git unpack-objects'
>
>      -    I verified the tests by introducing an incorrect rename
>      -    in do_sync_and_rename.
>      +    These tests ensure that the added data winds up in the object database.
>      +
>      +    In this change we introduce a new test helper lib-unique-files.sh. The
>      +    goal of this library is to create a tree of files that have different
>      +    oids from any other files that may have been created in the current test
>      +    repo. This helps us avoid missing validation of an object being added due
>      +    to it already being in the repo.
>
>           Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
>
>      @@ t/lib-unique-files.sh (new)
>       @@
>       +# Helper to create files with unique contents
>       +
>      -+test_create_unique_files_base__=$(date -u)
>      -+test_create_unique_files_counter__=0
>       +
>       +# Create multiple files with unique contents. Takes the number of
>       +# directories, the number of files in each directory, and the base
>       +# directory.
>       +#
>      -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
>      -+#                                    each in the specified directory, all
>      -+#                                    with unique contents.
>      ++# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
>      ++#                                         each in my_dir, all with unique
>      ++#                                         contents.
>       +
>       +test_create_unique_files() {
>       + test "$#" -ne 3 && BUG "3 param"
>      @@ t/lib-unique-files.sh (new)
>       + local dirs=$1
>       + local files=$2
>       + local basedir=$3
>      ++ local counter=0
>      ++ test_tick
>      ++ local basedata=$test_tick
>      ++
>       +
>      -+ rm -rf $basedir >/dev/null
>      ++ rm -rf $basedir
>       +
>       + for i in $(test_seq $dirs)
>       + do
>       +         local dir=$basedir/dir$i
>       +
>      -+         mkdir -p "$dir" > /dev/null
>      ++         mkdir -p "$dir"
>       +         for j in $(test_seq $files)
>       +         do
>      -+                 test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
>      -+                 echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>      ++                 counter=$((counter + 1))
>      ++                 echo "$basedata.$counter"  >"$dir/file$j.txt"
>       +         done
>       + done
>       +}
>      @@ t/t3700-add.sh: test_expect_success \
>       + rm -f fsynced_files &&
>       + git ls-files --stage fsync-files/ > fsynced_files &&
>       + test_line_count = 8 fsynced_files &&
>      -+ cat fsynced_files | awk '{print \$2}' | xargs -n1 git cat-file -e
>      ++ awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
>      ++"
>      ++
>      ++test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
>      ++ test_create_unique_files 2 4 fsync-files2 &&
>      ++ find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
>      ++ rm -f fsynced_files2 &&
>      ++ git ls-files --stage fsync-files2/ > fsynced_files2 &&
>      ++ test_line_count = 8 fsynced_files2 &&
>      ++ awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
>       +"
>       +
>        test_expect_success \
>      @@ t/t3903-stash.sh: test_expect_success 'stash handles skip-worktree entries nicel
>       + # which contains the untracked files
>       + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
>       + test_line_count = 8 fsynced_files &&
>      -+ cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
>      ++ awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
>       +"
>       +
>       +
>        test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>         expected="stash.useBuiltin support has been removed" &&
>
>      +
>      + ## t/t5300-pack-object.sh ##
>      +@@ t/t5300-pack-object.sh: test_expect_success 'pack-objects with bogus arguments' '
>      +
>      + check_unpack () {
>      +  test_when_finished "rm -rf git2" &&
>      +- git init --bare git2 &&
>      +- git -C git2 unpack-objects -n <"$1".pack &&
>      +- git -C git2 unpack-objects <"$1".pack &&
>      +- (cd .git && find objects -type f -print) |
>      +- while read path
>      +- do
>      +-         cmp git2/$path .git/$path || {
>      +-                 echo $path differs.
>      +-                 return 1
>      +-         }
>      +- done
>      ++ git $2 init --bare git2 &&
>      ++ (
>      ++         git $2 -C git2 unpack-objects -n <"$1".pack &&
>      ++         git $2 -C git2 unpack-objects <"$1".pack &&
>      ++         git $2 -C git2 cat-file --batch-check="%(objectname)"
>      ++ ) <obj-list >current &&
>      ++ cmp obj-list current
>      + }
>      +
>      + test_expect_success 'unpack without delta' '
>      +  check_unpack test-1-${packname_1}
>      + '
>      +
>      ++test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
>      ++ check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'pack with REF_DELTA' '
>      +  packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
>      +  check_deltas stderr -gt 0
>      +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with REF_DELTA' '
>      +  check_unpack test-2-${packname_2}
>      + '
>      +
>      ++test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
>      ++       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'pack with OFS_DELTA' '
>      +  packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
>      +                  <obj-list 2>stderr) &&
>      +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with OFS_DELTA' '
>      +  check_unpack test-3-${packname_3}
>      + '
>      +
>      ++test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
>      ++       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'compare delta flavors' '
>      +  perl -e '\''
>      +          defined($_ = -s $_) or die for @ARGV;
>  6:  3e6b80b5fa2 = 7:  6543564376a core.fsyncobjectfiles: performance tests for add and stash
>
> --
> gitgitgadget

Apologies for the spam, I'll be submitting a v6 shortly since there
were several things wrong with
this version.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux