On 24/07/16 18:16, Lars Schneider wrote: > > On 23 Jul 2016, at 01:19, Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxxx> wrote: > >> On 22/07/16 16:49, larsxschneider@xxxxxxxxx wrote: >>> From: Lars Schneider <larsxschneider@xxxxxxxxx> >>> >>> Git's clean/smudge mechanism invokes an external filter process for every >>> single blob that is affected by a filter. If Git filters a lot of blobs >>> then the startup time of the external filter processes can become a >>> significant part of the overall Git execution time. >>> >>> This patch adds the filter.<driver>.useProtocol option which, if enabled, >>> keeps the external filter process running and processes all blobs with >>> the following protocol over stdin/stdout. >>> >>> 1. Git starts the filter on first usage and expects a welcome message >>> with protocol version number: >>> Git <-- Filter: "git-filter-protocol\n" >>> Git <-- Filter: "version 1" >> >> Hmm, I was a bit surprised to see a 'filter' talk first (but so long as the >> interaction is fully defined, I guess it doesn't matter). >> >> [If you wanted to check for a version, you could add a "version" command >> instead, just like "clean" and "smudge".] > > It was a conscious decision to have the `filter` talk first. My reasoning was: > > (1) I want a reliable way to distinguish the existing filter protocol ("single-shot > invocation") from the new one ("long running"). I don't think there would be a > situation where the existing protocol would talk first. Therefore the users would > not accidentally mix them with a possibly half working, undetermined, outcome. If an 'single-shot' filter were incorrectly configured, instead of a new one, then the interaction could last a little while - since it would result in deadlock! ;-) [If Git talks first instead, configuring a 'single-shot' filter _may_ still result in a deadlock - depending on pipe size, etc.] > > (2) In the future we could extend the pipe protocol (see $gmane/297994, it's very > interesting). A filter could check Git's version and then pick the most appropriate > filter protocol on startup. > > >> [...] >>> +static struct cmd2process *start_protocol_filter(const char *cmd) >>> +{ >>> + int ret = 1; >>> + struct cmd2process *entry = NULL; >>> + struct child_process *process = NULL; >>> + struct strbuf nbuf = STRBUF_INIT; >>> + struct string_list split = STRING_LIST_INIT_NODUP; >>> + const char *argv[] = { NULL, NULL }; >>> + const char *header = "git-filter-protocol\nversion"; >>> + >>> + entry = xmalloc(sizeof(*entry)); >>> + hashmap_entry_init(entry, strhash(cmd)); >>> + entry->cmd = cmd; >>> + process = &entry->process; >>> + >>> + child_process_init(process); >>> + argv[0] = cmd; >>> + process->argv = argv; >>> + process->use_shell = 1; >>> + process->in = -1; >>> + process->out = -1; >>> + >>> + if (start_command(process)) { >>> + error("cannot fork to run external persistent filter '%s'", cmd); >>> + return NULL; >>> + } >>> + strbuf_reset(&nbuf); >>> + >>> + sigchain_push(SIGPIPE, SIG_IGN); >>> + ret &= strbuf_read_once(&nbuf, process->out, 0) > 0; >> >> Hmm, how much will be read into nbuf by this single call? >> Since strbuf_read_once() makes a single call to xread(), with >> a len argument that will probably be 8192, you can not really >> tell how much it will read, in general. (xread() does not >> guarantee how many bytes it will read.) >> >> In particular, it could be less than strlen(header). > > As mentioned to Torsten in $gmane/300156, I will add a newline > and then read until I find the second newline. That should solve > the problem, right? > > (You wrote in $gmane/300119 that I should ignore your email but > I think you have a valid point here ;-) Heh, as I said, it was late and I was trying to do several things at once. (I am updating 3 installations of Linux Mint 17.3 to Linux Mint 18 - I decided to do a complete re-install, since I needed to change partition sizes anyway. I have only just got email back up ...) I stopped commenting on the patch early but, after sending the first email, I decided to scan the rest of your patch before going to bed and noticed something which would invalidate my comments ... > > >>> [...] >>> + sigchain_push(SIGPIPE, SIG_IGN); >>> + switch (entry->protocol) { >>> + case 1: >>> + if (fd >= 0 && !src) { >>> + ret &= fstat(fd, &fileStat) != -1; >>> + len = fileStat.st_size; >>> + } >>> + strbuf_reset(&nbuf); >>> + strbuf_addf(&nbuf, "%s\n%s\n%zu\n", filter_type, path, len); >>> + ret &= write_str_in_full(process->in, nbuf.buf) > 1; >> >> why not write_in_full(process->in, nbuf.buf, nbuf.len) ? > OK, this would save a "strlen" call. Do you think such a function could be of general > use? If yes, then I would add: > > static inline ssize_t write_strbuf_in_full(int fd, struct strbuf *str) > { > return write_in_full(fd, str->buf, str->len); > } [I don't have strong feelings either way (but I suspect it's not worth it).] > > >>> + if (len > 0) { >>> + if (src) >>> + ret &= write_in_full(process->in, src, len) == len; >>> + else if (fd >= 0) >>> + ret &= copy_fd(fd, process->in) == 0; >>> + else >>> + ret &= 0; >>> + } >>> + >>> + strbuf_reset(&nbuf); >>> + while (xread(process->out, &c, 1) == 1 && c != '\n') >>> + strbuf_addchars(&nbuf, c, 1); >>> + nbuf_len = (size_t)strtol(nbuf.buf, &strtol_end, 10); >>> + ret &= (strtol_end != nbuf.buf && errno != ERANGE); >>> + strbuf_reset(&nbuf); >>> + if (nbuf_len > 0) >>> + ret &= strbuf_read_once(&nbuf, process->out, nbuf_len) == nbuf_len; >> >> Again, how many bytes will be read? >> Note, that in the default configuration, a _maximum_ of >> MAX_IO_SIZE (8MB or SSIZE_MAX, whichever is smaller) bytes >> will be read. ... In particular, your 2GB test case should not have worked, so I assumed that I had missed a loop somewhere ... > Would something like this be more appropriate? > > strbuf_reset(&nbuf); > if (nbuf_len > 0) { > strbuf_grow(&nbuf, nbuf_len); > ret &= read_in_full(process->out, nbuf.buf, nbuf_len) == nbuf_len; > } ... and this looks better. [Note: this comment would apply equally to the version message.] [Hmm, now can I remember which packages I need to install ...] ATB, Ramsay Jones -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html