Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

Jakub Narębski <jnareb@xxxxxxxxx> · Thu, 4 Aug 2016 12:18:33 +0200

[Some of this answer might have been invalidated by v4;
 I might be away from computer for a few days, so I won't be reviewing]

W dniu 03.08.2016 o 15:10, Lars Schneider pisze:
> On 01 Aug 2016, at 00:19, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze:
[...]

>> Could this whole "send single file" be put in a separate function?
>> Or is it not worth it?
> 
> This function would have almost the same signature as apply_protocol2_filter
> and therefore I would say it's not worth it since the function is not
> crazy long.

All right.  Though I would say that if it makes the function more
readable, then it might be worth it.

[...]
>>> +
>>> +	sigchain_push(SIGPIPE, SIG_IGN);
>>
>> Hmmm... ignoring SIGPIPE was good for one-shot filters.  Is it still
>> O.K. for per-command persistent ones?
> 
> Very good question. You are right... we don't want to ignore any errors
> during the protocol... I will remove it.

I was actually just wondering.

Actually the default behavior if SIGPIPE is not ignored (or if the
SIGPIPE signal is not blocked / masked out) is to *terminate* the
writing program, which we do not want.

The correct solution is to check for error during write, and check
if errno is set to EPIPE.  This means that reader (filter driver
process) has closed pipe, usually due to crash, and we need to handle
that sanely, either restarting or quitting while providing sane
information about error to the user.

Well, we might want to set a signal handler for SIGPIPE, not just
simply ignore it (especially for streaming case; stop streaming
if filter driver crashed); though signal handlers are quite limited
about what might be done in them.  But that's for the future.

Read from closed pipe returns EOF; write to closed pipe results in
SIGPIPE and returns -1 (setting errno to EPIPE).

>>
>>> +
>>> +	packet_buf_write(&nbuf, "%s\n", filter_type);
>>> +	ret &= !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>>> +
>>> +	if (ret) {
>>> +		strbuf_reset(&nbuf);
>>> +		packet_buf_write(&nbuf, "filename=%s\n", path);
>>> +		ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>>> +	}
>>
>> Perhaps a better solution would be
>>
>>        if (err)
>>        	goto fin_error;
>>
>> rather than this.
> 
> OK, I change it to goto error handling style.

Well, at least try it and check if it makes code more readable.

>>> +	if (ret) {
>>> +		strbuf_reset(&nbuf);
>>> +		packet_buf_write(&nbuf, "size=%"PRIuMAX"\n", (uintmax_t)len);
>>> +		ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>>> +	}
>>
>> Or maybe extract writing the header for a file into a separate function?
>> This one gets a bit long...
> 
> Maybe... but I think that would make it harder to understand the protocol. I
> think I would prefer to have all the communication in one function layer.

I don't understand your reasoning here ("make it harder to understand the
protocol").  If you choose good names for function writing header, then
the main function would be the high-level view of protocol, e.g.

   git> <command>
   git> <header>
   git> <contents>
   git> <flush>

   git< <command accepted>
   git< <contents>
   git< <flush>
   git< <sent status>

[...]
>>> +
>>> +	if (ret) {
>>> +		filter_result = packet_read_line(process->out, NULL);
>>> +		ret = !strcmp(filter_result, "success");
>>> +	}
>>> +
>>> +	sigchain_pop(SIGPIPE);
>>> +
>>> +	if (ret) {
>>> +		strbuf_swap(dst, &nbuf);
>>> +	} else {
>>> +		if (!filter_result || strcmp(filter_result, "reject")) {
>>> +			// Something went wrong with the protocol filter. Force shutdown!

Don't use C++ one-line comments (that's C99-ism).

>>> +			error("external filter '%s' failed", cmd);
>>> +			kill_protocol2_filter(&cmd_process_map, entry);
>>> +		}
>>> +	}
>>
>> So if Git gets finish signal "success" from filter, it accepts the output.
>> If Git gets finish signal "reject" from filter, it restarts filter (and
>> reject the output - user can retry the command himself / herself).
>> If Git gets any other finish signal, for example "error" (but this is not
>> standarized), then it rejects the output, keeping the unfiltered result,
>> but keeps filtering.
>>
>> I think it is not described in this detail in the documentation of the
>> new protocol.
> 
> Agreed, will add!

That would be nice.

>>> -	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
>>> +	if (!ca.drv->clean && ca.drv->process)
>>> +		return apply_protocol2_filter(
>>> +			path, NULL, 0, -1, NULL, ca.drv->process, FILTER_CAPABILITIES_CLEAN
>>> +		);
>>> +	else
>>> +		return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
>>
>> Could we augment apply_filter() instead, so that the invocation is
>>
>>        return apply_filter(path, NULL, 0, -1, NULL, ca.drv, FILTER_CLEAN);
>>
>> Though I am not sure if moving this conditional to apply_filter would
>> be a good idea; maybe wrapper around augmented apply_filter_do()?
> 
> Yes, a wrapper makes it way cleaner!

That's good, because we have quite a few of those constructs. 
And I think the compiler would inline it, so there is no penalty.

>>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
[...]
>>> +		git branch empty &&
>>> +
>>> +		cat ../test.o >test.r &&
>>
>> Err, the above is just copying file, isn't it?
>> Maybe it was copied from other tests, I have not checked.
> 
> It was created in the "setup" test.

What I meant here (among other things) is that you uselessly use
'cat' to copy files:

    +		cp ../test.o test.r &&

>>> +		echo "test22" >test2.r &&
>>> +		mkdir testsubdir &&
>>> +		echo "test333" >testsubdir/test3.r &&
>>
>> All right, we test text file, we test binary file (I assume), we test
>> file in a subdirectory.  What about testing empty file?  Or large file
>> which would not fit in the stdin/stdout buffer (as EXPENSIVE test)?
> 
> No binary file. The main reason for this test is to check multiple files.
> I'll add a empty file. A large file is tested in the next test.

I assume that this large file is binary file; what matters is that it
includes NUL character ("\0"), i.e. zero byte, checking that there is
no error that would terminate it at NUL.

I'll end here for now.

-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html