Christian Couder <christian.couder@xxxxxxxxx> writes: > From: Jeff King <peff@xxxxxxxx> > > Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx> > --- By the time the series loses RFC prefix, we'd need to see the above three lines straightened out a bit more, e.g. a real message and a more proper sign-off sequence. > +static struct odb_helper *find_or_create_helper(const char *name, int len) > +{ > + struct odb_helper *o; > + > + for (o = helpers; o; o = o->next) > + if (!strncmp(o->name, name, len) && !o->name[len]) > + return o; > + > + o = odb_helper_new(name, len); > + *helpers_tail = o; > + helpers_tail = &o->next; > + > + return o; > +} > + > +static int external_odb_config(const char *var, const char *value, void *data) > +{ > + struct odb_helper *o; > + const char *key, *dot; > + > + if (!skip_prefix(var, "odb.", &key)) > + return 0; > + dot = strrchr(key, '.'); > + if (!dot) > + return 0; Is this something Peff wrote long time ago? I find it surprising that he would write this without using parse_config_key(). > +struct odb_helper_cmd { > + struct argv_array argv; > + struct child_process child; > +}; > + > +static void prepare_helper_command(struct argv_array *argv, const char *cmd, > + const char *fmt, va_list ap) > +{ > + struct strbuf buf = STRBUF_INIT; > + > + strbuf_addstr(&buf, cmd); > + strbuf_addch(&buf, ' '); > + strbuf_vaddf(&buf, fmt, ap); > + > + argv_array_push(argv, buf.buf); > + strbuf_release(&buf); This concatenates the cmdname (like "get") and its parameters into a single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640") and then places the resulting string as the sole element in argv[] array, which I find somewhat unusual. It at least deserves a comment in front of the function that the callers are responsible to ensure that the result of vaddf(fmt, ap) is properly shell-quoted. Otherwise it is inviting quoting errors in the future (even if there is no such errors in the current code and command set, i.e. a 40-hex object name that "get" subcommand takes happens to require no special shell-quoting consideration). > +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1, > + int fd) > +{ > + struct odb_helper_object *obj; > + struct odb_helper_cmd cmd; > + unsigned long total_got; > + git_zstream stream; > + int zret = Z_STREAM_END; > + git_SHA_CTX hash; > + unsigned char real_sha1[20]; > + > + obj = odb_helper_lookup(o, sha1); > + if (!obj) > + return -1; > + > + if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0) > + return -1; > + > + memset(&stream, 0, sizeof(stream)); > + git_inflate_init(&stream); > + git_SHA1_Init(&hash); > + total_got = 0; > + > + for (;;) { > + unsigned char buf[4096]; > + int r; > + > + r = xread(cmd.child.out, buf, sizeof(buf)); > + if (r < 0) { > + error("unable to read from odb helper '%s': %s", > + o->name, strerror(errno)); > + close(cmd.child.out); > + odb_helper_finish(o, &cmd); > + git_inflate_end(&stream); > + return -1; > + } > + if (r == 0) > + break; > + > + write_or_die(fd, buf, r); > + > + stream.next_in = buf; > + stream.avail_in = r; > + do { > + unsigned char inflated[4096]; > + unsigned long got; > + > + stream.next_out = inflated; > + stream.avail_out = sizeof(inflated); > + zret = git_inflate(&stream, Z_SYNC_FLUSH); > + got = sizeof(inflated) - stream.avail_out; > + > + git_SHA1_Update(&hash, inflated, got); > + /* skip header when counting size */ > + if (!total_got) { > + const unsigned char *p = memchr(inflated, '\0', got); Does this assume that a single xread() that can result in a short-read would not return in the middle of "header", and if so, is that a safe assumption to make? > + if (p) > + got -= p - inflated + 1; > + else > + got = 0; > + } > + total_got += got; > + } while (stream.avail_in && zret == Z_OK); > + } > + > + close(cmd.child.out); > + git_inflate_end(&stream); > + git_SHA1_Final(real_sha1, &hash); > + if (odb_helper_finish(o, &cmd)) > + return -1; > + if (zret != Z_STREAM_END) { > + warning("bad zlib data from odb helper '%s' for %s", > + o->name, sha1_to_hex(sha1)); > + return -1; > + } > + if (total_got != obj->size) { > + warning("size mismatch from odb helper '%s' for %s (%lu != %lu)", > + o->name, sha1_to_hex(sha1), total_got, obj->size); > + return -1; > + } > + if (hashcmp(real_sha1, sha1)) { > + warning("sha1 mismatch from odb helper '%s' for %s (got %s)", > + o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1)); > + return -1; > + } > + > + return 0; > +} OK. So we call the external helper with "get" command, expecting that the helper returns the data as a zlib deflated stream, and validate that the data matches the expected hash and the expected size, while also saving the contents of the deflated stream to fd. Presumably the caller opened a fd to write into a loose object? I do not see this code actually validating that the loose object header, i.e. "<type> <len>\0", matches what the caller wanted to see in obj->size and obj->type. Shouldn't there be a check for that, too? I am tempted to debate myself if it is a sensible design to require "get" to return a loose object representation, but cannot decide without seeing the remainder of the series. An obvious alternative is to define the "get" request to interface with us via the raw contents (not even deflated) and leave the deflating to us, i.e. Git sitting on the receiving end, which would allow us to choose to store it differently (e.g. we may want to try streaming it into its own pack using the streaming.h API, for example).