Re: [PATCH v2] ls-files: adding support for submodules

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 13 Sep 2016 09:31:18 -0700

Brandon Williams <bmwill@xxxxxxxxxx> writes:

> Allow ls-files to recognize submodules in order to retrieve a list of
> files from a repository's submodules.  This is done by forking off a
> process to recursively call ls-files on all submodules. Also added an
> output-path-prefix command in order to prepend paths to child processes.
>
> Signed-off-by: Brandon Williams <bmwill@xxxxxxxxxx>

> @@ -68,6 +71,21 @@ static void write_eolinfo(const struct cache_entry *ce, const char *path)
>  static void write_name(const char *name)
>  {
>  	/*
> +	 * NEEDSWORK: To make this thread-safe, full_name would have to be owned
> +	 * by the caller.
> +	 *
> +	 * full_name get reused across output lines to minimize the allocation
> +	 * churn.
> +	 */
> +	static struct strbuf full_name = STRBUF_INIT;
> +	if (output_path_prefix != '\0') {
> +		strbuf_reset(&full_name);
> +		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, name);
> +		name = full_name.buf;
> +	}

At first glance it was surprising that no test caught this lack of
dereference; the reason is because you initialize output_path_prefix
to an empty string, not NULL, causing full_name.buf always used,
which does not have an impact on the output.

I think initializing it to NULL is a more typical way to say "this
option has not been given", and if you took that route, the
condition would become

	if (output_path_prefix && *output_path_prefix) {
        	...

In any case, the fact that only this much change was required to add
output-path-prefix shows two good things: (1) the original code was
already well structured, funneling any pathname we need to emit
through this single function so that we can do this kind of updates,
and (2) the author of the patch was competent to spot this single
point that needs to be updated.

Nice.

> +	status = run_command(&cp);
> +	if (status)
> +		exit(status);

run_command()'s return value comes from either start_command() or
finish_command().  These signal failure by returning a non-zero
value, and in practice they are negative small integers.  Feeding
negative value to exit() is not quite kosher.  Perhaps exit(128)
to mimick as if we called die() is better.

If your primary interest is to support the "find in the working tree
files that are tracked, recursively in submodules" grep, I think
this "when we hit a submodule, spawn a separate ls-files in there"
is sufficient and a solid base to build on it.

On the other hand, if you are more ambitious and "grep" is merely an
example of things that can be helped by having a list of paths
across module boundaries, we may want to "libify" ls-files in such a
way that a single process can instantiate one or more instances of
"ls-files machinery", that takes which repository to work in and
other arguments that specifies which paths to report, and instead of
always showing the result to the standard output, makes repeated
calls to a callback function to report the discovered path and other
attributes associated with the path that were asked for (the object
name, values of tag_*, etc.), without spawning a separate "ls-files"
process.

The latter would be a lot bigger task and I do not necessarily think
it is needed, but that is one possible future direction to keep in
mind.

Thanks, will queue with a minimum fix.