On 23 Jul 2016, at 01:19, Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxxx> wrote: > On 22/07/16 16:49, larsxschneider@xxxxxxxxx wrote: >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> Git's clean/smudge mechanism invokes an external filter process for every >> single blob that is affected by a filter. If Git filters a lot of blobs >> then the startup time of the external filter processes can become a >> significant part of the overall Git execution time. >> >> This patch adds the filter.<driver>.useProtocol option which, if enabled, >> keeps the external filter process running and processes all blobs with >> the following protocol over stdin/stdout. >> >> 1. Git starts the filter on first usage and expects a welcome message >> with protocol version number: >> Git <-- Filter: "git-filter-protocol\n" >> Git <-- Filter: "version 1" > > Hmm, I was a bit surprised to see a 'filter' talk first (but so long as the > interaction is fully defined, I guess it doesn't matter). > > [If you wanted to check for a version, you could add a "version" command > instead, just like "clean" and "smudge".] It was a conscious decision to have the `filter` talk first. My reasoning was: (1) I want a reliable way to distinguish the existing filter protocol ("single-shot invocation") from the new one ("long running"). I don't think there would be a situation where the existing protocol would talk first. Therefore the users would not accidentally mix them with a possibly half working, undetermined, outcome. (2) In the future we could extend the pipe protocol (see $gmane/297994, it's very interesting). A filter could check Git's version and then pick the most appropriate filter protocol on startup. > [...] >> +static struct cmd2process *start_protocol_filter(const char *cmd) >> +{ >> + int ret = 1; >> + struct cmd2process *entry = NULL; >> + struct child_process *process = NULL; >> + struct strbuf nbuf = STRBUF_INIT; >> + struct string_list split = STRING_LIST_INIT_NODUP; >> + const char *argv[] = { NULL, NULL }; >> + const char *header = "git-filter-protocol\nversion"; >> + >> + entry = xmalloc(sizeof(*entry)); >> + hashmap_entry_init(entry, strhash(cmd)); >> + entry->cmd = cmd; >> + process = &entry->process; >> + >> + child_process_init(process); >> + argv[0] = cmd; >> + process->argv = argv; >> + process->use_shell = 1; >> + process->in = -1; >> + process->out = -1; >> + >> + if (start_command(process)) { >> + error("cannot fork to run external persistent filter '%s'", cmd); >> + return NULL; >> + } >> + strbuf_reset(&nbuf); >> + >> + sigchain_push(SIGPIPE, SIG_IGN); >> + ret &= strbuf_read_once(&nbuf, process->out, 0) > 0; > > Hmm, how much will be read into nbuf by this single call? > Since strbuf_read_once() makes a single call to xread(), with > a len argument that will probably be 8192, you can not really > tell how much it will read, in general. (xread() does not > guarantee how many bytes it will read.) > > In particular, it could be less than strlen(header). As mentioned to Torsten in $gmane/300156, I will add a newline and then read until I find the second newline. That should solve the problem, right? (You wrote in $gmane/300119 that I should ignore your email but I think you have a valid point here ;-) >> [...] >> + sigchain_push(SIGPIPE, SIG_IGN); >> + switch (entry->protocol) { >> + case 1: >> + if (fd >= 0 && !src) { >> + ret &= fstat(fd, &fileStat) != -1; >> + len = fileStat.st_size; >> + } >> + strbuf_reset(&nbuf); >> + strbuf_addf(&nbuf, "%s\n%s\n%zu\n", filter_type, path, len); >> + ret &= write_str_in_full(process->in, nbuf.buf) > 1; > > why not write_in_full(process->in, nbuf.buf, nbuf.len) ? OK, this would save a "strlen" call. Do you think such a function could be of general use? If yes, then I would add: static inline ssize_t write_strbuf_in_full(int fd, struct strbuf *str) { return write_in_full(fd, str->buf, str->len); } >> + if (len > 0) { >> + if (src) >> + ret &= write_in_full(process->in, src, len) == len; >> + else if (fd >= 0) >> + ret &= copy_fd(fd, process->in) == 0; >> + else >> + ret &= 0; >> + } >> + >> + strbuf_reset(&nbuf); >> + while (xread(process->out, &c, 1) == 1 && c != '\n') >> + strbuf_addchars(&nbuf, c, 1); >> + nbuf_len = (size_t)strtol(nbuf.buf, &strtol_end, 10); >> + ret &= (strtol_end != nbuf.buf && errno != ERANGE); >> + strbuf_reset(&nbuf); >> + if (nbuf_len > 0) >> + ret &= strbuf_read_once(&nbuf, process->out, nbuf_len) == nbuf_len; > > Again, how many bytes will be read? > Note, that in the default configuration, a _maximum_ of > MAX_IO_SIZE (8MB or SSIZE_MAX, whichever is smaller) bytes > will be read. Would something like this be more appropriate? strbuf_reset(&nbuf); if (nbuf_len > 0) { strbuf_grow(&nbuf, nbuf_len); ret &= read_in_full(process->out, nbuf.buf, nbuf_len) == nbuf_len; } Thanks for the review, Lars -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html