Goal ~~~~ This is a patch series about libifying `git apply` functionality, and using this libified functionality in `git am`, so that no 'git apply' process is spawn anymore. This makes `git am` significantly faster, so `git rebase`, when it uses the am backend, is also significantly faster. Previous discussions and patches series ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This has initially been discussed in the following thread: http://thread.gmane.org/gmane.comp.version-control.git/287236/ Then a RFC patch series was sent: http://thread.gmane.org/gmane.comp.version-control.git/288489/ Then v1 was sent: http://thread.gmane.org/gmane.comp.version-control.git/292324/ Highlevel view of the pqtches in the series ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This new patch series is built on top of the above previous work. - Patches 1/94 to 47/94 are here from the RFC patch series. They get rid of most of the global variables in builtin/apply.c by moving the variables into a new 'struct apply_state', and they refactor the code around a bit. They haven't changed much since v1. There are mostly variables that have been moved around inside 'struct apply_state' or inside match_fragment(), new lines that have removed in 'struct apply_state' and function parameters that have been renamed. - Patches 48/94 to 84/94 were new in v1. They finish libifying the apply functionality that was in builtin/apply.c and move it into apply.{c,h}. And they use this libified functionality in git am so that it doesn't launch git apply processes any more. There are many changes in these patches, as there were a lot of great suggestions from many reviewers to improve on v1: - Lockfile management changes as suggested by Junio. - Many memory leaks were fixed as suggested by Eric. - A patch was splitted into two as suggested by Duy. - Some file descriptors have been closed as suggested by Dscho. - Some early return were moved up as suggested by Eric. - Many commit messages were improved. I am not very happy that I added some gotos, but it looked like a not too bad way to make sure that the memory and file descriptor cleanups were done before returning from some functions. - Patches 85/94 to 94/94 are new and can be considered RFC. They implement a way to make the libified apply silent. It is a new feature in the libified apply functionality. I consider that the apply functionality is properly libified before these patches, and that they should be in a separate series, but unfortunately using the libified apply in "git am" unmasks the fact that "git am", since it was a shell script, has been silencing the apply functionality by "futzing with file descriptors". And unfortunately this makes some reviewers unhappy. The last patch 94/94 adds a --silent command line option to git apply. This is not necessary, but some scripts could perhaps use it, and it could make it easier to test the new silent feature. By the way there are no tests yet for this new feature, and I am not sure at all that "--silent" and "be_silent" are good names. Also I will perhaps be asked to move or merge this part of this series, to remove the "futzing with file descriptors" earlier in the former patches. That's why this part of the series can be considered RFC. General comments ~~~~~~~~~~~~~~~~ Sorry if this patch series is long. I can split it into two or more series if it is prefered. I can also send diffs between this version and the previous one, but for now I'd rather not send them in this email, as it would make it very long. The benefits are not just related to not creating new processes. When `git am` launched a `git apply` process, this new process had to read the index from disk. Then after the `git apply`process had terminated, `git am` dropped its index and read the index from disk to get the index that had been modified by the `git apply`process. This was inefficient and also prevented the split-index mechanism to provide many performance benefits. Using this series as rebase material, Duy explains it like this: > Without the series, the picture is not so surprising. We run git-apply > 80+ times, each consists of this sequence > > read index > write index (cache tree updates only) > read index again > optionally initialize name hash (when new entries are added, I guess) > read packed-refs > write index > > With this series, we run a single git-apply which does > > read index (and sharedindex too if in split-index mode) > initialize name hash > write index 80+ times (See: http://thread.gmane.org/gmane.comp.version-control.git/292324/focus=292460) Links ~~~~~ This patch series is available here: https://github.com/chriscool/git/commits/libify-apply-use-in-am54 The previous version, v1, is available here: https://github.com/chriscool/git/commits/libify-apply-use-in-am25 Performance numbers ~~~~~~~~~~~~~~~~~~~ Only tests on Linux have been performed. It could be interesting to test on other platforms especially Windows and perhaps OSX too. - Around mid April Ævar did a huge many-hundred commit rebase on the kernel with untracked cache. command: git rebase --onto 1993b17 52bef0c 29dde7c Vanilla "next" without split index: 1m54.953s Vanilla "next" with split index: 1m22.476s This series on top of "next" without split index: 1m12.034s This series on top of "next" with split index: 0m15.678s Ævar used his Debian laptop with SSD. - Around mid April I tested rebasing 13 commits in Booking.com's monorepo on a Red Hat 6.5 server with split-index and GIT_TRACE_PERFORMANCE=1. With Git v2.8.0, the rebase took 6.375888383 s, with the git am command launched by the rebase command taking 3.705677431 s. With this series on top of next, the rebase took 3.044529494 s, with the git am command launched by the rebase command taking 0.583521168 s. Christian Couder (94): builtin/apply: make gitdiff_verify_name() return void builtin/apply: avoid parameter shadowing 'p_value' global builtin/apply: avoid parameter shadowing 'linenr' global builtin/apply: avoid local variable shadowing 'len' parameter builtin/apply: extract line_by_line_fuzzy_match() from match_fragment() builtin/apply: move 'options' variable into cmd_apply() builtin/apply: move 'read_stdin' global into cmd_apply() builtin/apply: introduce 'struct apply_state' to start libifying builtin/apply: move 'state' init into init_apply_state() builtin/apply: move 'unidiff_zero' global into 'struct apply_state' builtin/apply: move 'check' global into 'struct apply_state' builtin/apply: move 'check_index' global into 'struct apply_state' builtin/apply: move 'apply_in_reverse' global into 'struct apply_state' builtin/apply: move 'apply_with_reject' global into 'struct apply_state' builtin/apply: move 'apply_verbosely' global into 'struct apply_state' builtin/apply: move 'update_index' global into 'struct apply_state' builtin/apply: move 'allow_overlap' global into 'struct apply_state' builtin/apply: move 'cached' global into 'struct apply_state' builtin/apply: move 'diffstat' global into 'struct apply_state' builtin/apply: move 'numstat' global into 'struct apply_state' builtin/apply: move 'summary' global into 'struct apply_state' builtin/apply: move 'threeway' global into 'struct apply_state' builtin/apply: move 'no_add' global into 'struct apply_state' builtin/apply: move 'unsafe_paths' global into 'struct apply_state' builtin/apply: move 'line_termination' global into 'struct apply_state' builtin/apply: move 'fake_ancestor' global into 'struct apply_state' builtin/apply: move 'p_context' global into 'struct apply_state' builtin/apply: move 'apply' global into 'struct apply_state' builtin/apply: move 'patch_input_file' global into 'struct apply_state' builtin/apply: move 'limit_by_name' global into 'struct apply_state' builtin/apply: move 'has_include' global into 'struct apply_state' builtin/apply: move 'p_value' global into 'struct apply_state' builtin/apply: move 'p_value_known' global into 'struct apply_state' builtin/apply: move 'root' global into 'struct apply_state' builtin/apply: move 'whitespace_error' global into 'struct apply_state' builtin/apply: move 'whitespace_option' into 'struct apply_state' builtin/apply: remove whitespace_option arg from set_default_whitespace_mode() builtin/apply: move 'squelch_whitespace_errors' into 'struct apply_state' builtin/apply: move 'applied_after_fixing_ws' into 'struct apply_state' builtin/apply: move 'ws_error_action' into 'struct apply_state' builtin/apply: move 'ws_ignore_action' into 'struct apply_state' builtin/apply: move 'max_change' and 'max_len' into 'struct apply_state' builtin/apply: move 'state_linenr' global into 'struct apply_state' builtin/apply: move 'fn_table' global into 'struct apply_state' builtin/apply: move 'symlink_changes' global into 'struct apply_state' builtin/apply: move 'state' check into check_apply_state() builtin/apply: move applying patches into apply_all_patches() builtin/apply: rename 'prefix_' parameter to 'prefix' builtin/apply: move 'lock_file' global into 'struct apply_state' builtin/apply: move 'newfd' global into 'struct apply_state' builtin/apply: make apply_patch() return -1 instead of die()ing builtin/apply: read_patch_file() return -1 instead of die()ing builtin/apply: make find_header() return -1 instead of die()ing builtin/apply: make parse_chunk() return a negative integer on error builtin/apply: make parse_single_patch() return -1 on error apply: move 'struct apply_state' to apply.h builtin/apply: make parse_whitespace_option() return -1 instead of die()ing builtin/apply: make parse_ignorewhitespace_option() return -1 instead of die()ing builtin/apply: move init_apply_state() to apply.c apply: make init_apply_state() return -1 instead of exit()ing builtin/apply: make check_apply_state() return -1 instead of die()ing builtin/apply: move check_apply_state() to apply.c builtin/apply: make apply_all_patches() return -1 on error builtin/apply: make parse_traditional_patch() return -1 on error builtin/apply: make gitdiff_*() return 1 at end of header builtin/apply: make gitdiff_*() return -1 on error builtin/apply: change die_on_unsafe_path() to check_unsafe_path() builtin/apply: make build_fake_ancestor() return -1 on error builtin/apply: make remove_file() return -1 on error builtin/apply: make add_conflicted_stages_file() return -1 on error builtin/apply: make add_index_file() return -1 on error builtin/apply: make create_file() return -1 on error builtin/apply: make write_out_one_result() return -1 on error builtin/apply: make write_out_results() return -1 on error builtin/apply: make try_create_file() return -1 on error builtin/apply: make create_one_file() return -1 on error builtin/apply: rename option parsing functions apply: rename and move opt constants to apply.h Move libified code from builtin/apply.c to apply.{c,h} apply: make some parsing functions static again run-command: make dup_devnull() non static apply: roll back index lock file in case of error environment: add set_index_file() builtin/am: use apply api in run_apply() write_or_die: use warning() instead of fprintf(stderr, ...) apply: add 'be_silent' variable to 'struct apply_state' apply: make 'be_silent' incomatible with 'apply_verbosely' apply: don't print on stdout when be_silent is set usage: add set_warn_routine() usage: add get_error_routine() and get_warn_routine() apply: change error_routine when be_silent is set am: use be_silent in 'struct apply_state' to shut up applying patches run-command: make dup_devnull() static again builtin/apply: add a cli option for be_silent Makefile | 1 + builtin/apply.c => apply.c | 1750 ++++++++-------- apply.h | 146 ++ builtin/am.c | 91 +- builtin/apply.c | 4762 +------------------------------------------- cache.h | 1 + environment.c | 10 + git-compat-util.h | 3 + run-command.c | 2 +- t/t4012-diff-binary.sh | 4 +- t/t4254-am-corrupt.sh | 2 +- usage.c | 15 + write_or_die.c | 6 +- 13 files changed, 1324 insertions(+), 5469 deletions(-) copy builtin/apply.c => apply.c (73%) create mode 100644 apply.h rewrite builtin/apply.c (98%) -- 2.8.2.490.g3dabe57 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html