> On 10 Sep 2016, at 17:40, Torsten Bögershausen <tboegi@xxxxxx> wrote: > > [] > > One general question up here, more comments inline. > The current order for a clean-filter is like this, I removed the error handling: > > int convert_to_git() > { > ret |= apply_filter(path, src, len, -1, dst, filter); > if (ret && dst) { > src = dst->buf; > len = dst->len; > } > ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe); > return ret | ident_to_git(path, src, len, dst, ca.ident); > } > > The first step is the clean filter, the CRLF-LF conversion (if needed), > then ident. > The current implementation streams the whole file content to the filter, > (STDIN of the filter) and reads back STDOUT from the filter into a STRBUF. > This is to use the UNIX-like STDIN--STDOUT method for writing a filter. > > However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd() > offer a sort of short-cut: > The filter reads from the file directly, and the output of the filter is > read into a STRBUF. Are you sure? As far as I understand the code the filter does not read from the file in any case today. The functions would_convert_to_git_filter_fd() and convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The content is still streamed via pipes: https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4 > It looks as if the multi-filter approach can use this in a similar way: > Give the pathname to the filter, the filter opens the file for reading > and stream the result via the pkt-line protocol into Git. > This needs some more changes, and may be very well go into a separate patch > series. (and should). > > What I am asking for: > When a multi-filter is used, the content is handled to the filter via pkt-line, > and the result is given to Git via pkt-line ? > Nothing wrong with it, I just wonder, if it should be mentioned somewhere. That is most certainly a good idea and the main reason I added "capabilities" to the protocol. Joey Hess worked on this topic (not merged, yet) and I would like to make this available to the long-running filter protocol as soon as the feature is available: http://public-inbox.org/git/1468277112-9909-1-git-send-email-joeyh@xxxxxxxxxx/ >> +sub packet_read { >> + my $buffer; >> + my $bytes_read = read STDIN, $buffer, 4; >> + if ( $bytes_read == 0 ) { >> + >> + # EOF - Git stopped talking to us! >> + exit(); >> + } >> + elsif ( $bytes_read != 4 ) { >> + die "invalid packet size '$bytes_read' field"; >> + } > > This is half-kosher, I would say, > (And I really. really would like to see an implementation in C ;-) Would you be willing to contribute a patch? :-) > A read function may look like this: > > ret = read(0, &buffer, 4); > if (ret < 0) { > /* Error, nothing we can do */ > exit(1); > } else if (ret == 0) { > /* EOF */ > exit(0); > } else if (ret < 4) { > /* > * Go and read more, until we have 4 bytes or EOF or Error */ > } else { > /* Good case, see below */ > } I see. However, my intention was to provide an absolute minimal example to teach a reader how the protocol works. I consider all proper error handling an exercise for the reader ;-) >> +#define CAP_CLEAN (1u<<0) >> +#define CAP_SMUDGE (1u<<1) > > Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ? I had something like that but Junio suggested these names in V4: http://public-inbox.org/git/xmqq8twd8uld.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/ >> + >> + err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN); > > Extra () needed ? > More () in the code... I thought it might improve readability, but I will remove them if you think this would be more consistent with existing Git code. Thanks, Lars