Re: [RFC/PATCH v3 01/16] Add initial external odb support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Couder <christian.couder@xxxxxxxxx> writes:

> From: Jeff King <peff@xxxxxxxx>
>
> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> ---

By the time the series loses RFC prefix, we'd need to see the above
three lines straightened out a bit more, e.g. a real message and a
more proper sign-off sequence.

> +static struct odb_helper *find_or_create_helper(const char *name, int len)
> +{
> +	struct odb_helper *o;
> +
> +	for (o = helpers; o; o = o->next)
> +		if (!strncmp(o->name, name, len) && !o->name[len])
> +			return o;
> +
> +	o = odb_helper_new(name, len);
> +	*helpers_tail = o;
> +	helpers_tail = &o->next;
> +
> +	return o;
> +}
> +
> +static int external_odb_config(const char *var, const char *value, void *data)
> +{
> +	struct odb_helper *o;
> +	const char *key, *dot;
> +
> +	if (!skip_prefix(var, "odb.", &key))
> +		return 0;
> +	dot = strrchr(key, '.');
> +	if (!dot)
> +		return 0;

Is this something Peff wrote long time ago?  I find it surprising
that he would write this without using parse_config_key().

> +struct odb_helper_cmd {
> +	struct argv_array argv;
> +	struct child_process child;
> +};
> +
> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
> +				   const char *fmt, va_list ap)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, cmd);
> +	strbuf_addch(&buf, ' ');
> +	strbuf_vaddf(&buf, fmt, ap);
> +
> +	argv_array_push(argv, buf.buf);
> +	strbuf_release(&buf);

This concatenates the cmdname (like "get") and its parameters into a
single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640")
and then places the resulting string as the sole element in argv[]
array, which I find somewhat unusual.  It at least deserves a
comment in front of the function that the callers are responsible to
ensure that the result of vaddf(fmt, ap) is properly shell-quoted.
Otherwise it is inviting quoting errors in the future (even if there
is no such errors in the current code and command set, i.e. a 40-hex
object name that "get" subcommand takes happens to require no
special shell-quoting consideration).

> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
> +			    int fd)
> +{
> +	struct odb_helper_object *obj;
> +	struct odb_helper_cmd cmd;
> +	unsigned long total_got;
> +	git_zstream stream;
> +	int zret = Z_STREAM_END;
> +	git_SHA_CTX hash;
> +	unsigned char real_sha1[20];
> +
> +	obj = odb_helper_lookup(o, sha1);
> +	if (!obj)
> +		return -1;
> +
> +	if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0)
> +		return -1;
> +
> +	memset(&stream, 0, sizeof(stream));
> +	git_inflate_init(&stream);
> +	git_SHA1_Init(&hash);
> +	total_got = 0;
> +
> +	for (;;) {
> +		unsigned char buf[4096];
> +		int r;
> +
> +		r = xread(cmd.child.out, buf, sizeof(buf));
> +		if (r < 0) {
> +			error("unable to read from odb helper '%s': %s",
> +			      o->name, strerror(errno));
> +			close(cmd.child.out);
> +			odb_helper_finish(o, &cmd);
> +			git_inflate_end(&stream);
> +			return -1;
> +		}
> +		if (r == 0)
> +			break;
> +
> +		write_or_die(fd, buf, r);
> +
> +		stream.next_in = buf;
> +		stream.avail_in = r;
> +		do {
> +			unsigned char inflated[4096];
> +			unsigned long got;
> +
> +			stream.next_out = inflated;
> +			stream.avail_out = sizeof(inflated);
> +			zret = git_inflate(&stream, Z_SYNC_FLUSH);
> +			got = sizeof(inflated) - stream.avail_out;
> +
> +			git_SHA1_Update(&hash, inflated, got);
> +			/* skip header when counting size */
> +			if (!total_got) {
> +				const unsigned char *p = memchr(inflated, '\0', got);

Does this assume that a single xread() that can result in a
short-read would not return in the middle of "header", and if so, is
that a safe assumption to make?

> +				if (p)
> +					got -= p - inflated + 1;
> +				else
> +					got = 0;
> +			}
> +			total_got += got;
> +		} while (stream.avail_in && zret == Z_OK);
> +	}
> +
> +	close(cmd.child.out);
> +	git_inflate_end(&stream);
> +	git_SHA1_Final(real_sha1, &hash);
> +	if (odb_helper_finish(o, &cmd))
> +		return -1;
> +	if (zret != Z_STREAM_END) {
> +		warning("bad zlib data from odb helper '%s' for %s",
> +			o->name, sha1_to_hex(sha1));
> +		return -1;
> +	}
> +	if (total_got != obj->size) {
> +		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
> +			o->name, sha1_to_hex(sha1), total_got, obj->size);
> +		return -1;
> +	}
> +	if (hashcmp(real_sha1, sha1)) {
> +		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
> +			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
> +		return -1;
> +	}
> +
> +	return 0;
> +}

OK.  So we call the external helper with "get" command, expecting
that the helper returns the data as a zlib deflated stream, and
validate that the data matches the expected hash and the expected
size, while also saving the contents of the deflated stream to fd.
Presumably the caller opened a fd to write into a loose object?  I
do not see this code actually validating that the loose object
header, i.e. "<type> <len>\0", matches what the caller wanted to see
in obj->size and obj->type.  Shouldn't there be a check for that,
too?

I am tempted to debate myself if it is a sensible design to require
"get" to return a loose object representation, but cannot decide
without seeing the remainder of the series.  An obvious alternative
is to define the "get" request to interface with us via the raw
contents (not even deflated) and leave the deflating to us, i.e. Git
sitting on the receiving end, which would allow us to choose to
store it differently (e.g. we may want to try streaming it into its
own pack using the streaming.h API, for example).




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux