On Mon, Nov 12, 2012 at 09:44:01AM -0800, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > The only ident-parsing function we currently provide is > > get_author_ident_from_commit. This is not very > > flexible for two reasons: > > > > 1. It takes a commit as an argument, and can't read from > > commit headers saved on disk. > > > > 2. It will only parse authors, not committers. > > > > This patch provides a more flexible interface which will > > parse multiple idents from a commit provide on stdin. We can > > easily use it as a building block for the current function > > to retain compatibility. > > > > Signed-off-by: Jeff King <peff@xxxxxxxx> > > --- > > Since we are counting processes in this series, I should note that this > > actually adds a subshell invocation for each call, since it went from: > > > > script='...' > > sed $script > > > > to: > > > > sed "$(make_script)" > > > > For filter-branch, which is really the only high-performance caller we > > have, this is negated by the fact that it will do author and committer > > at the same time, saving us an extra subshell (in addition to an extra > > sed invocation). > > Given that pick-ident-script is a const function, a caller that > repeatedly call is could call it once and use it in a variable, no? The problem is that it is a helper called from parse_ident_from_commit. And that function just passes along its arguments, so it does not know that it is being called repeatedly with the same arguments. So you'd have to either change the interface or memoize internally. I don't think memoization is a good option for two reasons: 1. Storing the arguments to compare to later is complex. You don't want to just store "$*" from the last run and see if we got the same arguments. You'd have to quote your delimiter (e.g., you would not want to confuse ("foo", "bar") with ("foo bar"). Though in this instance, we know that our args do not have spaces, so we could get away with that. 2. If you are in a subshell or even a while loop, your memoized variable will not be retained. So unless somebody has some clever scheme for memoizing shell functions without any process overhead, it is probably not worth it. Changing the interface for get_author_ident_from_commit would be a pain, but if we just wanted to help filter-branch, we could do something like this: diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 5314249..7a693ba 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -74,7 +74,7 @@ finish_ident() { } set_ident () { - parse_ident_from_commit author AUTHOR committer COMMITTER + parse_ident_from_commit_via_script "$ident_script" finish_ident AUTHOR finish_ident COMMITTER } @@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then require_clean_work_tree 'rewrite branches' fi +ident_script=$(pick_ident_script author AUTHOR committer COMMITTER) tempdir=.git-rewrite filter_env= filter_tree= diff --git a/git-sh-setup.sh b/git-sh-setup.sh index 22f0aed..1e20e17 100644 --- a/git-sh-setup.sh +++ b/git-sh-setup.sh @@ -225,10 +225,17 @@ pick_ident_script () { echo '/^$/q' } +# Feed a pick_ident_script return value to sed. Use this instead of +# parse_ident_from_commit below if you are going to be parsing commits in a +# tight loop and want to save a process. +parse_ident_from_commit_via_script() { + LANG=C LC_ALL=C sed -ne "$1" +} + # Create a pick-script as above and feed it to sed. Stdout is suitable for # feeding to eval. parse_ident_from_commit () { - LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")" + parse_ident_from_commit_via_script "$(pick_ident_script "$@")" } # Parse the author from a commit given as an argument. Stdout is suitable for -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html