Re: [PATCH v7 10/10] convert: add filter.<driver>.process option

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 13 Sep 2016 23:04:49 +0100

> On 10 Sep 2016, at 17:40, Torsten Bögershausen <tboegi@xxxxxx> wrote:
> 
> []
> 
> One general question up here, more comments inline.
> The current order for a clean-filter is like this, I removed the error handling:
> 
> int convert_to_git()
> {
> 	ret |= apply_filter(path, src, len, -1, dst, filter);
> 	if (ret && dst) {
> 		src = dst->buf;
> 		len = dst->len;
> 	}
> 	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
> 	return ret | ident_to_git(path, src, len, dst, ca.ident);
> }
> 
> The first step is the clean filter, the CRLF-LF conversion (if needed),
> then ident.
> The current implementation streams the whole file content to the filter,
> (STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
> This is to use the UNIX-like STDIN--STDOUT method for writing a filter.
> 
> However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
> offer a sort of short-cut:
> The filter reads from the file directly, and the output of the filter is
> read into a STRBUF.

Are you sure? As far as I understand the code the filter does not read from 
the file in any case today.  The functions would_convert_to_git_filter_fd() and 
convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The content 
is still streamed via pipes:
https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4

> It looks as if the multi-filter approach can use this in a similar way:
> Give the pathname to the filter, the filter opens the file for reading
> and stream the result via the pkt-line protocol into Git.
> This needs some more changes, and may be very well go into a separate patch
> series. (and should).
> 
> What I am asking for:
> When a multi-filter is used, the content is handled to the filter via pkt-line,
> and the result is given to Git via pkt-line ?
> Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

That is most certainly a good idea and the main reason I added "capabilities"
to the protocol. Joey Hess worked on this topic (not merged, yet) and I would 
like to make this available to the long-running filter protocol as soon as the
feature is available:
http://public-inbox.org/git/1468277112-9909-1-git-send-email-joeyh@xxxxxxxxxx/

>> +sub packet_read {
>> +    my $buffer;
>> +    my $bytes_read = read STDIN, $buffer, 4;
>> +    if ( $bytes_read == 0 ) {
>> +
>> +        # EOF - Git stopped talking to us!
>> +        exit();
>> +    }
>> +    elsif ( $bytes_read != 4 ) {
>> +        die "invalid packet size '$bytes_read' field";
>> +    }
> 
> This is half-kosher, I would say,
> (And I really. really would like to see an implementation in C ;-)

Would you be willing to contribute a patch? :-)

> A read function may look like this:
> 
>   ret = read(0, &buffer, 4);
>   if (ret < 0) {
>     /* Error, nothing we can do */
>     exit(1);
>   } else if (ret == 0) {
>     /* EOF  */
>     exit(0);
>   } else if (ret < 4) {
>     /* 
>      * Go and read more, until we have 4 bytes or EOF or Error */
>   } else {
>     /* Good case, see below */
>   }

I see. However, my intention was to provide an absolute minimal
example to teach a reader how the protocol works. I consider
all proper error handling an exercise for the reader ;-)

>> +#define CAP_CLEAN    (1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?

I had something like that but Junio suggested these names in V4:
http://public-inbox.org/git/xmqq8twd8uld.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/

>> +
>> +	err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
> 
> Extra () needed ?
> More () in the code...

I thought it might improve readability, but I will remove them
if you think this would be more consistent with existing Git code.

Thanks,
Lars