Re: [PATCH v10 9/9] ls-tree.c: introduce "--format" option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 13 2022, Teng Long wrote:

> Signed-off-by: Teng Long <dyroneteng@xxxxxxxxx>
> ---
>  Documentation/git-ls-tree.txt |  51 +++++++++++++-
>  builtin/ls-tree.c             | 129 +++++++++++++++++++++++++++++++++-
>  t/t3105-ls-tree-format.sh     |  55 +++++++++++++++
>  3 files changed, 230 insertions(+), 5 deletions(-)
>  create mode 100755 t/t3105-ls-tree-format.sh
>
> diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
> index 729370f235..ebdde6eae3 100644
> --- a/Documentation/git-ls-tree.txt
> +++ b/Documentation/git-ls-tree.txt
> @@ -10,9 +10,9 @@ SYNOPSIS
>  --------
>  [verse]
>  'git ls-tree' [-d] [-r] [-t] [-l] [-z]
> -	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
> -	    <tree-ish> [<path>...]
> -
> +	    [--name-only] [--name-status] [--object-only]
> +	    [--full-name] [--full-tree] [--abbrev[=<n>]] 

Let's split up this re-flow only change into its own commit? I.e. the
only non-whitespace change here is beginning with [--format].

If it was the right thing to do to re-flow this then we didn't need
[--format=<format>] to exist to do so...

> +	    [--format=<format>] <tree-ish> [<path>...]
>  DESCRIPTION

Removing this \n breaks the formatting in the file. See "make man && man
./Documentation/git-ls-tree.1". The ./Documentation/doc-diff utility is
also handy for sanity checking the documentation formatting.

>  -----------
>  Lists the contents of a given tree object, like what "/bin/ls -a" does
> @@ -79,6 +79,16 @@ OPTIONS
>  	Do not limit the listing to the current working directory.
>  	Implies --full-name.
>  
> +--format=<format>::
> +	A string that interpolates `%(fieldname)` from the result
> +	being shown. It also interpolates `%%` to `%`, and
> +	`%xx` where `xx` are hex digits interpolates to character
> +	with hex code `xx`; for example `%00` interpolates to
> +	`\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
> +	When specified, `--format` cannot be combined with other
> +	format-altering options, including `--long`, `--name-only`
> +	and `--object-only`.
> +

These new docs make sense & seem to cover all the basis, thanks!

> +
> +Default format:
> +
>          <mode> SP <type> SP <object> TAB <file>

Here because we've added --format discussing the previous pseudo-format
as a "default" format becomes confusing. Let's instead say:

        The output format of `ls-tree` is determined by either the `--format` option,
        or other format-altering options such as `--name-long` etc. (see `--format` above).

        The use of certain `--format` directives is equivalent to using those options,
        but invoking the full formatting machinery can be slower than using an appropriate
        formatting option.

        In cases where the `--format` would exactly map to an existing option `ls-tree` will
        use the appropriate faster path. Thus the default format is equivalent to:
        ---
        %(mode) %(type) %(object)%x09%(file)
        ---

Or something like that. We could then discuss e.g. --name-long being
`%(mode) %(type) %(object) %(size:padded)%x09%(file)` when we discuss
that option.

>  This output format is compatible with what `--index-info --stdin` of
> @@ -105,6 +118,38 @@ quoted as explained for the configuration variable `core.quotePath`
>  (see linkgit:git-config[1]).  Using `-z` the filename is output
>  verbatim and the line is terminated by a NUL byte.
>  
> +Customized format:
> +
> +It is possible to print in a custom format by using the `--format` option,
> +which is able to interpolate different fields using a `%(fieldname)` notation.
> +For example, if you want to only print the <object> and <file> fields with a
> +JSON style, executing with a specific "--format" like
> +
> +        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
> +
> +The output format changes to:
> +
> +        {"object":"<object>", "file":"<file>"}

This one-liner is guaranteed to result in invalid JSON on some
repositories, both because JSON is inherently a bad fit for git's data
model (JSON needs to be in one Unicode encoding, Git's tree data might
me in a mixture of encodings), and because it'll break if the file
includes a '"'.

I think it's better to just replace this with some example involving -z,
or at least prominently note that this is broken in the general case,
but can be used ad-hoc to quickly check things with "jq" or whatever.

> +FIELD NAMES
> +-----------
> +
> +Various values from structured fields can be used to interpolate
> +into the resulting output. For each outputing line, the following
> +names can be used:
> +
> +mode::
> +	The mode of the object.
> +type::
> +	The type of the object (`blob` or `tree`).
> +object::
> +	The name of the object.
> +size[:padded]::
> +	The size of the object ("-" if it's a tree).
> +	It also supports a padded format of size with "%(size:padded)".
> +file::
> +	The filename of the object.

In
https://lore.kernel.org/git/cover.1641043500.git.dyroneteng@xxxxxxxxx/
you noted that you changed the field names of e.g. "objectname" to
"object" etc. You're right that I picked these as-is from the
git-for-each-ref formatting.

1/3 of your reasoning for doing so was to make it consistent with the
documentation examples of e.g.:

     <mode> SP <type> SP <object> TAB <file>

I think in any case (as noted above) we should change those to use the
--format), so that leaves just:

 - "I prefer to make the name more simple to memorize and type"
 - "I think the names with "object" prefix are [from git-for-each-ref
   and the object* prefixes aren't redundant there, but would be here]".

I think both of those still apply, but I think having these consistent
with git-for-each-ref outweighs the slight benefit of shorter names.

Right now only a handful of things support these sort of --format
directives, but we've already got RFC/WIP patches to add that to
git-cat-file, and are likely to add more in the future.

I'd also like us to eventually be able to combine what are now separate
built-ins with their own --format to expose more deeply some internal
APIs via IPC. E.g. now you can do this:

    git for-each-ref --format='%(refname) %(tree)'

But to list each of those trees you'd need to pipe that output into this
new 'git ls-tree --format. But imagine being able to do something like:

    git for-each-ref --format='%(refname) %(git-ls-tree --format %%(objectname) %(tree))'

Where we'd just invoke git-ls-tree for you without running a full
sub-process. I think both for that hypothetical and working with the two
--formats now having to use %(type) in some places but %(objecttype)
etc. in others is just needlessly confusing. Let's just consistently use
the same format names everywhere.

Specifically for your s/path/file/ name change, that's just inaccurate, consider:

    $ ./git ls-tree --format="%(mode) %(type) %(file)" -t HEAD -- t/README
    040000 tree t
    100644 blob t/README

And:

    $ $ (cd t && ../git ls-tree --format="%(mode) %(type) %(file)" -t -r HEAD -- README)
    040000 tree ./
    100644 blob README

I.e. we talk about <path> in the existing SYNOPSIS for a reason. That we
had a "<file>" in the existing format demo was a bug/shorthand that we
shouldn't be propagating further.

> [...]
> +static const char *format;
> +static const char *default_format = "%(mode) %(type) %(object)%x09%(file)";
> +static const char *long_format = "%(mode) %(type) %(object) %(size:padded)%x09%(file)";
> +static const char *name_only_format = "%(file)";
> +static const char *object_only_format = "%(object)";
> +

One advantage of keeping the variable names I picked in
https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@xxxxxxxxx/
is that they align, so you can instantly see that the first two are
equivalent until the "%x09".

It also makes it easier to review to avoid such churn, to see what you
really changed I'm looking at a local version of a range-diff where I
renamed these, the struct you renamed etc. back just to see what you
/really/ changed. I.e. what are functional v.s. renaming changes.

>  static int parse_shown_fields(void)
>  {
>  	if (cmdmode == MODE_NAME_ONLY) {
> @@ -76,6 +82,72 @@ static int parse_shown_fields(void)
>  	return 1;
>  }
>  
> +static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
> +			      const enum object_type type, unsigned int padded)
> +{
> +	if (type == OBJ_BLOB) {
> +		unsigned long size;
> +		if (oid_object_info(the_repository, oid, &size) < 0)
> +			die(_("could not get object info about '%s'"),
> +			    oid_to_hex(oid));
> +		if (padded)
> +			strbuf_addf(line, "%7" PRIuMAX, (uintmax_t)size);
> +		else
> +			strbuf_addf(line, "%" PRIuMAX, (uintmax_t)size);

Here you changed my '"%"PRIuMAX' to '"%" PRIuMAX'. The former is the
prevailing style in this codebase, and avoiding the formatting churn
makes the inter-diff easier to read.

> +	} else if (padded) {
> +		strbuf_addf(line, "%7s", "-");
> +	} else {
> +		strbuf_addstr(line, "-");
> +	}
> +}

Ditto some harder to review interdiff due to renaming
churn. I.e. s/line/sb/ in both this and expand_show_tree(). I really
wouldn't care at all except because of all the manual work in reviewing
the inter-diff between my original version & this derived version.

In the case of "line" that's not even an improvement. With a --format
we're not building a "line", the user is free to insert any arbitrary
directives including \n's, so we might be working on multiple lines.

> +test_expect_success 'ls-tree --format usage' '
> +	test_expect_code 129 git ls-tree --format=fmt -l HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --name-only HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --name-status HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --object-only HEAD
> +'

This & several other changes v.s. my version are good, e.g. here I seem
to have repeated the logic error I noted for your version (i.e omitting
"HEAD"), oops!

> +test_expect_success 'setup' '
> +	mkdir dir &&
> +	test_commit dir/sub-file &&
> +	test_commit top-file
> +'
> +
> +test_ls_tree_format () {
> +	format=$1 &&
> +	opts=$2 &&
> +	shift 2 &&
> +	git ls-tree $opts -r HEAD >expect.raw &&
> +	sed "s/^/> /" >expect <expect.raw &&
> +	git ls-tree --format="> $format" -r HEAD >actual &&
> +	test_cmp expect actual
> +}
> +
> +test_expect_success 'ls-tree --format=<default-like>' '
> +	test_ls_tree_format \
> +		"%(mode) %(type) %(object)%x09%(file)" \
> +		""
> +'
> +
> +test_expect_success 'ls-tree --format=<long-like>' '
> +	test_ls_tree_format \
> +		"%(mode) %(type) %(object) %(size:padded)%x09%(file)" \
> +		"--long"
> +'
> +
> +test_expect_success 'ls-tree --format=<name-only-like>' '
> +	test_ls_tree_format \
> +		"%(file)" \
> +		"--name-only"
> +'
> +
> +test_expect_success 'ls-tree --format=<object-only-like>' '
> +	test_ls_tree_format \
> +		"%(object)" \
> +		"--object-only"
> +'
> +
> +test_done

As I noted in my RFC CL (https://lore.kernel.org/git/RFC-cover-0.7-00000000000-20211217T131635Z-avarab@xxxxxxxxx/):

	"the tests for ls-tree are really
	lacking. E.g. I seem to have a rather obvious bug in how -t and the
	--format interact here, but no test catches it."

So first, in my version of adding --format I was careful to make
--name-only etc. imply a given --format, and then only at the last
minute would we take the "fast path":
https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@xxxxxxxxx/

You rewrote that in
https://lore.kernel.org/git/e0add802fbbabde7e7b3743127b2d4047f1ce760.1641043500.git.dyroneteng@xxxxxxxxx/
and qremoved the limited "GIT_TEST_LS_TREE_FORMAT_BACKEND" testing I
added, so now the internal --format machinery can't be run through the
existing tests we do have.

Even with that re-added I really wouldn't trust that this code is doing
the right thing (and as noted, I don't trust my own RFC version
either). I think e.g. our "coverage" Makefile targets would be a good
start as a first approximation, i.e. running the /ls-tree/ tests and
seeing if we have full coverage.

> Signed-off-by: Teng Long <dyroneteng@xxxxxxxxx>

As I noted in 7/9 I think this patch is 9/9 still mostly something I
wrote, so that the "author" and Signed-off-by should be preserved. The
below is a range-diff of an amended version I've been looking at in
trying to review this. It undoes several (but not all) of your
formatting/renaming-only changes, just so that I could see what the
non-formatting changes were:

1:  6c96dff15c5 ! 1:  917bb168d45 ls-tree: add a --format=<fmt> option
    @@
      ## Metadata ##
    -Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
    +Author: Teng Long <dyroneteng@xxxxxxxxx>
     
      ## Commit message ##
    -    ls-tree: add a --format=<fmt> option
    +    ls-tree.c: introduce "--format" option
     
         Add a --format option to ls-tree. It has an existing default output,
         and then --long and --name-only options to emit the default output
    @@ Commit message
     
         The --format implementation is slower than the existing code, but this
         change does not cause any performance regressions. We'll leave the
    -    existing show_tree() unchanged, and only run show_tree_format() in if
    +    existing show_tree() unchanged, and only run show_tree_fmt() in if
         a --format different than the hardcoded built-in ones corresponding to
         the existing modes is provided.
     
    -    "Slower" here can bee seen via the the following "hyperfine"
    -    command. This uses GIT_TEST_LS_TREE_FORMAT_BACKEND=<bool> to force the
    -    use of the new backend:
    -
    -        $ hyperfine -L env false,true -L f "-r,-r -l,-r --name-only,-r --format='%(objectname)'" 'GIT_TEST_LS_TREE_FORMAT_BACKEND={env} ./git -C ~/g/linux ls-tree {f} HEAD' -r 10
    -        Benchmark 1: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r HEAD
    -          Time (mean ± σ):      86.1 ms ±   0.6 ms    [User: 65.2 ms, System: 20.9 ms]
    -          Range (min … max):    85.2 ms …  87.5 ms    10 runs
    +    I.e. something like the "--long" output would be much slower with
    +    this, mainly due to how we need to allocate various things to do with
    +    quote.c instead of spewing the output directly to stdout.
     
    -        Benchmark 2: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r HEAD
    -          Time (mean ± σ):     122.5 ms ±   0.6 ms    [User: 101.3 ms, System: 21.1 ms]
    -          Range (min … max):   121.8 ms … 123.4 ms    10 runs
    +    The new option of '--format' comes from Ævar Arnfjörð Bjarmasonn's
    +    idea and suggestion, this commit makes modifications in terms of the
    +    original discussion on community [1].
     
    -        Benchmark 3: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r -l HEAD
    -          Time (mean ± σ):     277.7 ms ±   1.3 ms    [User: 234.6 ms, System: 43.0 ms]
    -          Range (min … max):   275.9 ms … 279.7 ms    10 runs
    +    Here is the statistics about performance tests:
     
    -        Benchmark 4: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r -l HEAD
    -          Time (mean ± σ):     332.8 ms ±   2.6 ms    [User: 282.0 ms, System: 50.7 ms]
    -          Range (min … max):   329.6 ms … 338.2 ms    10 runs
    +    1. Default format (hitten the builtin formats):
     
    -        Benchmark 5: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --name-only HEAD
    -          Time (mean ± σ):      71.8 ms ±   0.4 ms    [User: 54.1 ms, System: 17.6 ms]
    -          Range (min … max):    71.2 ms …  72.5 ms    10 runs
    +        "git ls-tree <tree-ish>" vs "--format='%(mode) %(type) %(object)%x09%(file)'"
     
    -        Benchmark 6: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --name-only HEAD
    -          Time (mean ± σ):      86.6 ms ±   0.5 ms    [User: 65.7 ms, System: 20.7 ms]
    -          Range (min … max):    85.9 ms …  87.4 ms    10 runs
    +        $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r HEAD"
    +        Benchmark 1: /opt/git/master/bin/git ls-tree -r HEAD
    +        Time (mean ± σ):     105.2 ms ±   3.3 ms    [User: 84.3 ms, System: 20.8 ms]
    +        Range (min … max):    99.2 ms … 113.2 ms    28 runs
     
    -        Benchmark 7: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD
    -          Time (mean ± σ):      85.8 ms ±   0.6 ms    [User: 66.2 ms, System: 19.5 ms]
    -          Range (min … max):    85.0 ms …  86.9 ms    10 runs
    +        $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD"
    +        Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD
    +        Time (mean ± σ):     106.4 ms ±   2.7 ms    [User: 86.1 ms, System: 20.2 ms]
    +        Range (min … max):   100.2 ms … 110.5 ms    29 runs
     
    -        Benchmark 8: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD
    -          Time (mean ± σ):      85.3 ms ±   0.2 ms    [User: 66.6 ms, System: 18.7 ms]
    -          Range (min … max):    85.0 ms …  85.7 ms    10 runs
    +    2. Default format includes object size (hitten the builtin formats):
     
    -        Summary
    -          'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --name-only HEAD' ran
    -            1.19 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD'
    -            1.19 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD'
    -            1.20 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r HEAD'
    -            1.21 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --name-only HEAD'
    -            1.71 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r HEAD'
    -            3.87 ± 0.03 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r -l HEAD'
    -            4.64 ± 0.05 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r -l HEAD'
    +        "git ls-tree -l <tree-ish>" vs "--format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'"
     
    -    I.e. something like the "--long" output would be much slower with
    -    this, mainly due to how we need to allocate various things to do with
    -    quote.c instead of spewing the output directly to stdout.
    +        $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r -l HEAD"
    +        Benchmark 1: /opt/git/master/bin/git ls-tree -r -l HEAD
    +        Time (mean ± σ):     335.1 ms ±   6.5 ms    [User: 304.6 ms, System: 30.4 ms]
    +        Range (min … max):   327.5 ms … 348.4 ms    10 runs
     
    -    But even a --format='%(objectname)' is fast with the new backend, so
    -    this is viable as a replacement for adding new formats, and we'll pay
    -    for this added complexity as a one-off, and not again every time a new
    -    format needs to be added. See [1] for an example of what it would
    -    otherwise take to add an --object-name flag.
    +        $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD"
    +        Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD
    +        Time (mean ± σ):     337.2 ms ±   8.2 ms    [User: 309.2 ms, System: 27.9 ms]
    +        Range (min … max):   328.8 ms … 349.4 ms    10 runs
     
    -    1. https://lore.kernel.org/git/2e449d1c792ff81da5f22c8bf65ed33c393d62f8.1639721750.git.dyroneteng@xxxxxxxxx/
    +    Links:
    +            [1] https://public-inbox.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@xxxxxxxxx/
     
    -    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
    +    Signed-off-by: Teng Long <dyroneteng@xxxxxxxxx>
     
    - ## builtin/ls-tree.c ##
    -@@ builtin/ls-tree.c: static struct pathspec pathspec;
    - static int chomp_prefix;
    - static const char *ls_tree_prefix;
    + ## Documentation/git-ls-tree.txt ##
    +@@ Documentation/git-ls-tree.txt: SYNOPSIS
    + --------
    + [verse]
    + 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
    +-	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
    +-	    <tree-ish> [<path>...]
    ++	    [--name-only] [--name-status] [--object-only]
    ++	    [--full-name] [--full-tree] [--abbrev[=<n>]]
    ++	    [--format=<format>] <tree-ish> [<path>...]
      
    -+/*
    -+ * The format equivalents that show_tree() is prepared to handle.
    -+ */
    -+static const char *ls_tree_format_d = "%(objectmode) %(objecttype) %(objectname)%x09%(path)";
    -+static const char *ls_tree_format_l = "%(objectmode) %(objecttype) %(objectname) %(objectsize:padded)%x09%(path)";
    -+static const char *ls_tree_format_n = "%(path)";
    + DESCRIPTION
    + -----------
    +@@ Documentation/git-ls-tree.txt: OPTIONS
    + 	Do not limit the listing to the current working directory.
    + 	Implies --full-name.
    + 
    ++--format=<format>::
    ++	A string that interpolates `%(fieldname)` from the result
    ++	being shown. It also interpolates `%%` to `%`, and
    ++	`%xx` where `xx` are hex digits interpolates to character
    ++	with hex code `xx`; for example `%00` interpolates to
    ++	`\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
    ++	When specified, `--format` cannot be combined with other
    ++	format-altering options, including `--long`, `--name-only`
    ++	and `--object-only`.
     +
    - static const  char * const ls_tree_usage[] = {
    - 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
    - 	NULL
    - };
    + [<path>...]::
    + 	When paths are given, show them (note that this isn't really raw
    + 	pathnames, but rather a list of patterns to match).  Otherwise
    +@@ Documentation/git-ls-tree.txt: OPTIONS
      
    -+struct read_tree_ls_tree_data {
    -+	const char *format;
    -+	struct strbuf sb_scratch;
    -+	struct strbuf sb_tmp;
    -+};
    + Output Format
    + -------------
    ++
    ++Default format:
    ++
    +         <mode> SP <type> SP <object> TAB <file>
    + 
    + This output format is compatible with what `--index-info --stdin` of
    +@@ Documentation/git-ls-tree.txt: quoted as explained for the configuration variable `core.quotePath`
    + (see linkgit:git-config[1]).  Using `-z` the filename is output
    + verbatim and the line is terminated by a NUL byte.
    + 
    ++Customized format:
     +
    ++It is possible to print in a custom format by using the `--format` option,
    ++which is able to interpolate different fields using a `%(fieldname)` notation.
    ++For example, if you want to only print the <object> and <file> fields with a
    ++JSON style, executing with a specific "--format" like
    ++
    ++        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
    ++
    ++The output format changes to:
    ++
    ++        {"object":"<object>", "file":"<file>"}
    ++
    ++FIELD NAMES
    ++-----------
    ++
    ++Various values from structured fields can be used to interpolate
    ++into the resulting output. For each outputing line, the following
    ++names can be used:
    ++
    ++mode::
    ++	The mode of the object.
    ++type::
    ++	The type of the object (`blob` or `tree`).
    ++object::
    ++	The name of the object.
    ++size[:padded]::
    ++	The size of the object ("-" if it's a tree).
    ++	It also supports a padded format of size with "%(size:padded)".
    ++file::
    ++	The filename of the object.
    ++
    + GIT
    + ---
    + Part of the linkgit:git[1] suite
    +
    + ## builtin/ls-tree.c ##
    +@@ builtin/ls-tree.c: static unsigned int shown_fields;
    + #define FIELD_DEFAULT 29 /* 11101 size is not shown to output by default */
    + #define FIELD_LONG_DEFAULT  (FIELD_DEFAULT | FIELD_SIZE)
    + 
     +struct expand_ls_tree_data {
     +	unsigned mode;
     +	enum object_type type;
     +	const struct object_id *oid;
     +	const char *pathname;
    -+	const char *basebuf;
    -+	struct strbuf *sb_scratch;
    -+	struct strbuf *sb_tmp;
    ++	struct strbuf *base;
     +};
     +
    - static int show_recursive(const char *base, size_t baselen, const char *pathname)
    + static const  char * const ls_tree_usage[] = {
    + 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
    + 	NULL
    +@@ builtin/ls-tree.c: enum {
    + 
    + static int cmdmode = MODE_UNSPECIFIED;
    + 
    ++static const char *format;
    ++static const char *ls_tree_format_d = "%(objectmode) %(objecttype) %(objectname)%x09%(path)";
    ++static const char *ls_tree_format_l = "%(objectmode) %(objecttype) %(objectname) %(objectsize:padded)%x09%(path)";
    ++static const char *ls_tree_format_n = "%(path)";
    ++static const char *ls_tree_format_o = "%(objectname)";
    ++
    + static int parse_shown_fields(void)
      {
    - 	int i;
    -@@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, const char *pathname
    - 	return 0;
    + 	if (cmdmode == MODE_NAME_ONLY) {
    +@@ builtin/ls-tree.c: static int parse_shown_fields(void)
    + 	return 1;
      }
      
    -+static void expand_objectsize(struct strbuf *sb,
    -+			      const struct object_id *oid,
    -+			      const enum object_type type,
    -+			      unsigned int padded)
    ++static void expand_objectsize(struct strbuf *sb, const struct object_id *oid,
    ++			      const enum object_type type, unsigned int padded)
     +{
     +	if (type == OBJ_BLOB) {
     +		unsigned long size;
     +		if (oid_object_info(the_repository, oid, &size) < 0)
    -+			die(_("could not get object info about '%s'"), oid_to_hex(oid));
    ++			die(_("could not get object info about '%s'"),
    ++			    oid_to_hex(oid));
     +		if (padded)
     +			strbuf_addf(sb, "%7"PRIuMAX, (uintmax_t)size);
     +		else
    @@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, c
     +	}
     +}
     +
    -+static size_t expand_show_tree(struct strbuf *sb,
    -+			       const char *start,
    ++static size_t expand_show_tree(struct strbuf *sb, const char *start,
     +			       void *context)
     +{
     +	struct expand_ls_tree_data *data = context;
     +	const char *end;
     +	const char *p;
    -+	size_t len;
    ++	unsigned int errlen;
    ++	size_t len = strbuf_expand_literal_cb(sb, start, NULL);
     +
    -+	len = strbuf_expand_literal_cb(sb, start, NULL);
     +	if (len)
     +		return len;
    -+
     +	if (*start != '(')
    -+		die(_("bad format as of '%s'"), start);
    ++		die(_("bad ls-tree format: as '%s'"), start);
    ++
     +	end = strchr(start + 1, ')');
     +	if (!end)
    -+		die(_("ls-tree format element '%s' does not end in ')'"),
    -+		    start);
    -+	len = end - start + 1;
    ++		die(_("bad ls-tree format: element '%s' does not end in ')'"), start);
     +
    ++	len = end - start + 1;
     +	if (skip_prefix(start, "(objectmode)", &p)) {
     +		strbuf_addf(sb, "%06o", data->mode);
     +	} else if (skip_prefix(start, "(objecttype)", &p)) {
    @@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, c
     +	} else if (skip_prefix(start, "(objectsize)", &p)) {
     +		expand_objectsize(sb, data->oid, data->type, 0);
     +	} else if (skip_prefix(start, "(objectname)", &p)) {
    -+		strbuf_addstr(sb, find_unique_abbrev(data->oid, abbrev));
    ++		strbuf_add_unique_abbrev(sb, data->oid, abbrev);
     +	} else if (skip_prefix(start, "(path)", &p)) {
    -+		const char *name = data->basebuf;
    ++		const char *name = data->base->buf;
     +		const char *prefix = chomp_prefix ? ls_tree_prefix : NULL;
    -+
    -+		if (prefix)
    -+			name = relative_path(name, prefix, data->sb_scratch);
    -+		quote_c_style(name, data->sb_tmp, NULL, 0);
    -+		strbuf_add(sb, data->sb_tmp->buf, data->sb_tmp->len);
    -+
    -+		strbuf_reset(data->sb_tmp);
    -+		/* The relative_path() function resets "scratch" */
    ++		struct strbuf quoted = STRBUF_INIT;
    ++		struct strbuf s = STRBUF_INIT;
    ++		strbuf_addstr(data->base, data->pathname);
    ++		name = relative_path(data->base->buf, prefix, &s);
    ++		quote_c_style(name, &quoted, NULL, 0);
    ++		strbuf_addbuf(sb, &quoted);
    ++		strbuf_release(&s);
    ++		strbuf_release(&quoted);
     +	} else {
    -+		unsigned int errlen = (unsigned long)len;
    -+		die(_("bad ls-tree format specifiec %%%.*s"), errlen, start);
    ++		errlen = (unsigned long)len;
    ++		die(_("bad ls-tree format: %%%.*s"), errlen, start);
     +	}
    -+
     +	return len;
     +}
     +
    - static int show_tree_init(enum object_type *type, struct strbuf *base,
    - 			  const char *pathname, unsigned mode, int *retval)
    + static int show_recursive(const char *base, size_t baselen,
    + 			  const char *pathname)
      {
    -@@ builtin/ls-tree.c: static int show_tree_init(enum object_type *type, struct strbuf *base,
    +@@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen,
      	return 0;
      }
      
    +-static int show_default(const struct object_id *oid, enum object_type type,
    +-			const char *pathname, unsigned mode,
    +-			struct strbuf *base)
     +static int show_tree_fmt(const struct object_id *oid, struct strbuf *base,
     +			 const char *pathname, unsigned mode, void *context)
    -+{
    -+	struct read_tree_ls_tree_data *data = context;
    -+	struct expand_ls_tree_data my_data = {
    + {
    +-	size_t baselen = base->len;
    ++	size_t baselen;
    ++	int recurse = 0;
    ++	struct strbuf line = STRBUF_INIT;
    ++	enum object_type type = object_type(mode);
    ++
    ++	struct expand_ls_tree_data data = {
     +		.mode = mode,
    -+		.type = OBJ_BLOB,
    ++		.type = type,
     +		.oid = oid,
     +		.pathname = pathname,
    -+		.sb_scratch = &data->sb_scratch,
    -+		.sb_tmp = &data->sb_tmp,
    ++		.base = base,
     +	};
    -+	struct strbuf sb = STRBUF_INIT;
    -+	int retval = 0;
    -+	size_t baselen;
     +
    -+	if (show_tree_init(&my_data.type, base, pathname, mode, &retval))
    -+		return retval;
    ++	if (type == OBJ_TREE && show_recursive(base->buf, base->len, pathname))
    ++		recurse = READ_TREE_RECURSIVE;
    ++	if (type == OBJ_TREE && recurse && !(ls_options & LS_SHOW_TREES))
    ++		return recurse;
    ++	if (type == OBJ_BLOB && (ls_options & LS_TREE_ONLY))
    ++		return 0;
     +
     +	baselen = base->len;
    -+	strbuf_addstr(base, pathname);
    -+	strbuf_reset(&sb);
    -+	my_data.basebuf = base->buf;
    -+
    -+	strbuf_expand(&sb, data->format, expand_show_tree, &my_data);
    -+	strbuf_addch(&sb, line_termination);
    -+	fwrite(sb.buf, sb.len, 1, stdout);
    ++	strbuf_expand(&line, format, expand_show_tree, &data);
    ++	strbuf_addch(&line, line_termination);
    ++	fwrite(line.buf, line.len, 1, stdout);
    ++	strbuf_release(&line);
     +	strbuf_setlen(base, baselen);
    -+
    -+	return retval;
    ++	return recurse;
     +}
     +
    - static int show_tree(const struct object_id *oid, struct strbuf *base,
    - 		const char *pathname, unsigned mode, void *context)
    - {
    ++static int show_default(struct expand_ls_tree_data *data)
    ++{
    ++	size_t baselen = data->base->len;
    + 
    + 	if (shown_fields & FIELD_SIZE) {
    + 		char size_text[24];
    +-		if (type == OBJ_BLOB) {
    ++		if (data->type == OBJ_BLOB) {
    + 			unsigned long size;
    +-			if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
    ++			if (oid_object_info(the_repository, data->oid, &size) == OBJ_BAD)
    + 				xsnprintf(size_text, sizeof(size_text), "BAD");
    + 			else
    + 				xsnprintf(size_text, sizeof(size_text),
    +@@ builtin/ls-tree.c: static int show_default(const struct object_id *oid, enum object_type type,
    + 		} else {
    + 			xsnprintf(size_text, sizeof(size_text), "-");
    + 		}
    +-		printf("%06o %s %s %7s\t", mode, type_name(type),
    +-		find_unique_abbrev(oid, abbrev), size_text);
    ++		printf("%06o %s %s %7s\t", data->mode, type_name(data->type),
    ++		find_unique_abbrev(data->oid, abbrev), size_text);
    + 	} else {
    +-		printf("%06o %s %s\t", mode, type_name(type),
    +-		find_unique_abbrev(oid, abbrev));
    ++		printf("%06o %s %s\t", data->mode, type_name(data->type),
    ++		find_unique_abbrev(data->oid, abbrev));
    + 	}
    +-	baselen = base->len;
    +-	strbuf_addstr(base, pathname);
    +-	write_name_quoted_relative(base->buf,
    ++	baselen = data->base->len;
    ++	strbuf_addstr(data->base, data->pathname);
    ++	write_name_quoted_relative(data->base->buf,
    + 				   chomp_prefix ? ls_tree_prefix : NULL, stdout,
    + 				   line_termination);
    +-	strbuf_setlen(base, baselen);
    ++	strbuf_setlen(data->base, baselen);
    + 	return 1;
    + }
    + 
    +@@ builtin/ls-tree.c: static int show_tree(const struct object_id *oid, struct strbuf *base,
    + 	size_t baselen;
    + 	enum object_type type = object_type(mode);
    + 
    ++	struct expand_ls_tree_data data = {
    ++		.mode = mode,
    ++		.type = type,
    ++		.oid = oid,
    ++		.pathname = pathname,
    ++		.base = base,
    ++	};
    ++
    + 	if (type == OBJ_TREE && show_recursive(base->buf, base->len, pathname))
    + 		recurse = READ_TREE_RECURSIVE;
    + 	if (type == OBJ_TREE && recurse && !(ls_options & LS_SHOW_TREES))
    +@@ builtin/ls-tree.c: static int show_tree(const struct object_id *oid, struct strbuf *base,
    + 	}
    + 
    + 	if (shown_fields >= FIELD_DEFAULT)
    +-		show_default(oid, type, pathname, mode, base);
    ++		show_default(&data);
    + 
    + 	return recurse;
    + }
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
      	struct object_id oid;
      	struct tree *tree;
      	int i, full_tree = 0;
    -+	const char *implicit_format = NULL;
    -+	const char *format = NULL;
    -+	struct read_tree_ls_tree_data read_tree_cb_data = {
    -+		.sb_scratch = STRBUF_INIT,
    -+		.sb_tmp = STRBUF_INIT,
    -+	};
    ++	read_tree_fn_t fn = show_tree;
      	const struct option ls_tree_options[] = {
      		OPT_BIT('d', NULL, &ls_options, N_("only show trees"),
      			LS_TREE_ONLY),
    @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *pref
      		OPT_BOOL(0, "full-tree", &full_tree,
      			 N_("list entire tree; not just current directory "
      			    "(implies --full-name)")),
    -+		OPT_STRING_F(0 , "format", &format, N_("format"),
    -+			     N_("format to use for the output"), PARSE_OPT_NONEG),
    ++		OPT_STRING_F(0, "format", &format, N_("format"),
    ++			     N_("format to use for the output"),
    ++			     PARSE_OPT_NONEG),
      		OPT__ABBREV(&abbrev),
      		OPT_END()
      	};
    -+	read_tree_fn_t fn = show_tree;
    - 
    - 	git_config(git_default_config, NULL);
    - 	ls_tree_prefix = prefix;
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
    - 	if ( (LS_TREE_ONLY|LS_RECURSIVE) ==
      	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
      		ls_options |= LS_SHOW_TREES;
    -+	if (ls_options & LS_NAME_ONLY)
    -+		implicit_format = ls_tree_format_n;
    -+	if (ls_options & LS_SHOW_SIZE)
    -+		implicit_format = ls_tree_format_l;
    -+
    -+	if (format && implicit_format)
    -+		usage_msg_opt(_("providing --format cannot be combined with other format-altering options"),
    -+			      ls_tree_usage, ls_tree_options);
    -+	if (implicit_format)
    -+		format = implicit_format;
    -+	if (!format)
    -+		format = ls_tree_format_d;
      
    ++	if (format && cmdmode)
    ++		usage_msg_opt(
    ++			_("--format can't be combined with other format-altering options"),
    ++			ls_tree_usage, ls_tree_options);
      	if (argc < 1)
      		usage_with_options(ls_tree_usage, ls_tree_options);
    + 	if (get_oid(argv[0], &oid))
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
      	tree = parse_tree_indirect(&oid);
      	if (!tree)
      		die("not a tree object");
    +-	return !!read_tree(the_repository, tree,
    +-			   &pathspec, show_tree, NULL);
     +
     +	/*
     +	 * The generic show_tree_fmt() is slower than show_tree(), so
     +	 * take the fast path if possible.
     +	 */
    -+	if (format && (!strcmp(format, ls_tree_format_d) ||
    -+		       !strcmp(format, ls_tree_format_l) ||
    -+		       !strcmp(format, ls_tree_format_n)))
    ++	if (format &&
    ++	    (!strcmp(format, ls_tree_format_d) ||
    ++	     !strcmp(format, ls_tree_format_l) ||
    ++	     !strcmp(format, ls_tree_format_n) ||
    ++	     !strcmp(format, ls_tree_format_o)))
     +		fn = show_tree;
     +	else if (format)
     +		fn = show_tree_fmt;
    -+	/*
    -+	 * Allow forcing the show_tree_fmt(), to test that it can
    -+	 * handle the test suite.
    -+	 */
    -+	if (git_env_bool("GIT_TEST_LS_TREE_FORMAT_BACKEND", 0))
    -+		fn = show_tree_fmt;
     +
    -+	read_tree_cb_data.format = format;
    - 	return !!read_tree(the_repository, tree,
    --			   &pathspec, show_tree, NULL);
    -+			   &pathspec, fn, &read_tree_cb_data);
    ++	return !!read_tree(the_repository, tree, &pathspec, fn, NULL);
      }
     
      ## t/t3105-ls-tree-format.sh (new) ##
    @@ t/t3105-ls-tree-format.sh (new)
     +. ./test-lib.sh
     +
     +test_expect_success 'ls-tree --format usage' '
    -+	test_expect_code 129 git ls-tree --format=fmt -l &&
    -+	test_expect_code 129 git ls-tree --format=fmt --name-only &&
    -+	test_expect_code 129 git ls-tree --format=fmt --name-status
    ++	test_expect_code 129 git ls-tree --format=fmt -l HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --name-only HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --name-status HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --object-only HEAD
     +'
     +
     +test_expect_success 'setup' '
    @@ t/t3105-ls-tree-format.sh (new)
     +	test_ls_tree_format \
     +		"%(path)" \
     +		"--name-only"
    ++'
     +
    ++test_expect_success 'ls-tree --format=<object-only-like>' '
    ++	test_ls_tree_format \
    ++		"%(objectname)" \
    ++		"--object-only"
     +'
     +
     +test_done




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux