Re: [PATCH 3/5] submodule: helper to run foreach in parallel

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 25 Aug 2015 14:09:12 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> This runs a command on each submodule in parallel and should eventually
> replace `git submodule foreach`.
>
> There is a new option -j/--jobs (inspired by make) to specify the number
> of parallel threads.
>
> The jobs=1 case needs to be special cases to exactly replicate the current
> default behavior of `git submodule foreach` such as working stdin input.

"git submodule foreach-parallel -j1" feels like a roundabout way to
say "git submodule foreach"; "I want you to go parallel.  Oh, not
really, I want you to do one thing at a time".

I do not think these two have to be equivalent---users who need the
traditional one-by-one "foreach" behaviour (including the standard
input and other aspects that are unreasonable to expect
"foreach-parallel -jN" to replicate) can and should choose to use
"foreach", not "foreach-parallel -j1".

In any case, I do not think I saw any special casing of -j1 case that
attempts to leave the standard input operational.  Did I miss
something, or is the above describing what is left to do?

> For more than one job there is no input possible and the output of both
> stdout/stderr of the command are put into the stderr in an ordered fashion,
> i.e. the tasks to not intermingle their output in races.

To rephrase what I said earlier, "for parallel version, the above
things happen, even with numthreads==1", is perfectly fine.

> +	cp->no_stdin = 1;
> +	cp->out = 0;
> +	cp->err = -1;
> +	cp->dir = args->path;
> +	cp->stdout_to_stderr = 1;

So the standard error and the standard output are mixed to a single
pipe ...

> +	cp->use_shell = 1;
> +
> +	if (start_command(cp)) {
> +		die("Could not start command");
> +		for (i = 0; cp->args.argv; i++)
> +			fprintf(stderr, "%s\n", cp->args.argv[i]);
> +	}
> +
> +	while (1) {
> +		ssize_t len = xread(cp->err, buf, sizeof(buf));
> +		if (len < 0)
> +			die("Read from child failed");
> +		else if (len == 0)
> +			break;
> +		else {
> +			strbuf_add(&out, buf, len);
> +		}

... and the whole thing is accumulated in core???

> +	}
> +	if (finish_command(cp))
> +		die("command died with error");
> +
> +	sem_wait(args->mutex);
> +	fputs(out.buf, stderr);
> +	sem_post(args->mutex);

... and emitted to standard error?

I would have expected that the standard error would be left alone
(i.e. letting warnings from multiple jobs to be mixed together
simply because everybody writes to the same file descriptor), while
the standard output would be line-buffered, perhaps captured by the
above loop and then emitted under mutex, or something.

I think I said this earlier, but latency to the first output counts
as an activity feedback mechanism.

> +	if (module_list_compute(0, nullargv, NULL, &pathspec) < 0)
> +		return 1;
> +
> +	gitmodules_config();
> +
> +	aq = create_task_queue(number_threads);
> +
> +	for (i = 0; i < ce_used; i++) {
> +		const struct submodule *sub;
> +		const struct cache_entry *ce = ce_entries[i];
> +		struct submodule_args *args = malloc(sizeof(*args));
> +
> +		if (ce_stage(ce))
> +			args->sha1 = xstrdup(sha1_to_hex(null_sha1));
> +		else
> +			args->sha1 = xstrdup(sha1_to_hex(ce->sha1));
> +
> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%s/.git", ce->name);
> +		if (!file_exists(sb.buf))
> +			continue;
> +
> +		args->path = ce->name;
> +		sub = submodule_from_path(null_sha1, args->path);
> +		if (!sub)
> +			die("No submodule mapping found in .gitmodules for path '%s'", args->path);
> +
> +		args->name = sub->name;
> +		args->toplevel = xstrdup(xgetcwd());
> +		args->cmd = argv;
> +		args->mutex = mutex;
> +		args->prefix = alternative_path;
> +		add_task(aq, run_cmd_submodule, args);
> +	}
> +
> +	finish_task_queue(aq, NULL);

This is very nice.  Declare a task queue with N workers, throw bunch
of task to it and then wait for all of them to complete.  Things
can't be simpler than that ;-).  One thing that other callers of the
mechanism might want may be to plug and unplug the task queue, but
that can wait until the need arises.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html