(I realized that I didn't answer this email about the v3 version. Sorry about this late answer.) On Thu, Dec 1, 2016 at 12:30 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Christian Couder <christian.couder@xxxxxxxxx> writes: > >> From: Jeff King <peff@xxxxxxxx> >> >> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx> >> --- > > By the time the series loses RFC prefix, we'd need to see the above > three lines straightened out a bit more, e.g. a real message and a > more proper sign-off sequence. Yeah, it is better in the v5 series I will send really soon now. >> +static struct odb_helper *find_or_create_helper(const char *name, int len) >> +{ >> + struct odb_helper *o; >> + >> + for (o = helpers; o; o = o->next) >> + if (!strncmp(o->name, name, len) && !o->name[len]) >> + return o; >> + >> + o = odb_helper_new(name, len); >> + *helpers_tail = o; >> + helpers_tail = &o->next; >> + >> + return o; >> +} >> + >> +static int external_odb_config(const char *var, const char *value, void *data) >> +{ >> + struct odb_helper *o; >> + const char *key, *dot; >> + >> + if (!skip_prefix(var, "odb.", &key)) >> + return 0; >> + dot = strrchr(key, '.'); >> + if (!dot) >> + return 0; > > Is this something Peff wrote long time ago? I find it surprising > that he would write this without using parse_config_key(). parse_config_key() is used now. >> +struct odb_helper_cmd { >> + struct argv_array argv; >> + struct child_process child; >> +}; >> + >> +static void prepare_helper_command(struct argv_array *argv, const char *cmd, >> + const char *fmt, va_list ap) >> +{ >> + struct strbuf buf = STRBUF_INIT; >> + >> + strbuf_addstr(&buf, cmd); >> + strbuf_addch(&buf, ' '); >> + strbuf_vaddf(&buf, fmt, ap); >> + >> + argv_array_push(argv, buf.buf); >> + strbuf_release(&buf); > > This concatenates the cmdname (like "get") and its parameters into a > single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640") > and then places the resulting string as the sole element in argv[] > array, which I find somewhat unusual. It at least deserves a > comment in front of the function that the callers are responsible to > ensure that the result of vaddf(fmt, ap) is properly shell-quoted. I added a comment. > Otherwise it is inviting quoting errors in the future (even if there > is no such errors in the current code and command set, i.e. a 40-hex > object name that "get" subcommand takes happens to require no > special shell-quoting consideration). Yeah, but I am not sure how this could be done better. >> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1, >> + int fd) >> +{ >> + struct odb_helper_object *obj; >> + struct odb_helper_cmd cmd; >> + unsigned long total_got; >> + git_zstream stream; >> + int zret = Z_STREAM_END; >> + git_SHA_CTX hash; >> + unsigned char real_sha1[20]; >> + >> + obj = odb_helper_lookup(o, sha1); >> + if (!obj) >> + return -1; >> + >> + if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0) >> + return -1; >> + >> + memset(&stream, 0, sizeof(stream)); >> + git_inflate_init(&stream); >> + git_SHA1_Init(&hash); >> + total_got = 0; >> + >> + for (;;) { >> + unsigned char buf[4096]; >> + int r; >> + >> + r = xread(cmd.child.out, buf, sizeof(buf)); >> + if (r < 0) { >> + error("unable to read from odb helper '%s': %s", >> + o->name, strerror(errno)); >> + close(cmd.child.out); >> + odb_helper_finish(o, &cmd); >> + git_inflate_end(&stream); >> + return -1; >> + } >> + if (r == 0) >> + break; >> + >> + write_or_die(fd, buf, r); >> + >> + stream.next_in = buf; >> + stream.avail_in = r; >> + do { >> + unsigned char inflated[4096]; >> + unsigned long got; >> + >> + stream.next_out = inflated; >> + stream.avail_out = sizeof(inflated); >> + zret = git_inflate(&stream, Z_SYNC_FLUSH); >> + got = sizeof(inflated) - stream.avail_out; >> + >> + git_SHA1_Update(&hash, inflated, got); >> + /* skip header when counting size */ >> + if (!total_got) { >> + const unsigned char *p = memchr(inflated, '\0', got); > > Does this assume that a single xread() that can result in a > short-read would not return in the middle of "header", and if so, is > that a safe assumption to make? I am not sure what would go wrong in case of a short read. My guess is that as long as we test that p is not NULL below we should be fine. As Peff wrote this code, he could probably answer much better than me. >> + if (p) >> + got -= p - inflated + 1; >> + else >> + got = 0; >> + } >> + total_got += got; >> + } while (stream.avail_in && zret == Z_OK); >> + } >> + >> + close(cmd.child.out); >> + git_inflate_end(&stream); >> + git_SHA1_Final(real_sha1, &hash); >> + if (odb_helper_finish(o, &cmd)) >> + return -1; >> + if (zret != Z_STREAM_END) { >> + warning("bad zlib data from odb helper '%s' for %s", >> + o->name, sha1_to_hex(sha1)); >> + return -1; >> + } >> + if (total_got != obj->size) { >> + warning("size mismatch from odb helper '%s' for %s (%lu != %lu)", >> + o->name, sha1_to_hex(sha1), total_got, obj->size); >> + return -1; >> + } >> + if (hashcmp(real_sha1, sha1)) { >> + warning("sha1 mismatch from odb helper '%s' for %s (got %s)", >> + o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1)); >> + return -1; >> + } >> + >> + return 0; >> +} > > OK. So we call the external helper with "get" command, expecting > that the helper returns the data as a zlib deflated stream, and > validate that the data matches the expected hash and the expected > size, while also saving the contents of the deflated stream to fd. > Presumably the caller opened a fd to write into a loose object? I > do not see this code actually validating that the loose object > header, i.e. "<type> <len>\0", matches what the caller wanted to see > in obj->size and obj->type. Shouldn't there be a check for that, > too? Yeah, it would be better with a check. It also looks like checking that must take into account possible short-read that could happen (as discussed above), so for now I will see if we need to fix the header reading, before taking a look at this. > I am tempted to debate myself if it is a sensible design to require > "get" to return a loose object representation, but cannot decide > without seeing the remainder of the series. An obvious alternative > is to define the "get" request to interface with us via the raw > contents (not even deflated) and leave the deflating to us, i.e. Git > sitting on the receiving end, which would allow us to choose to > store it differently (e.g. we may want to try streaming it into its > own pack using the streaming.h API, for example). There is now both a get_raw_obj and a get_git_obj to handle both cases.