Re: [PATCH v1 0/3] Git filter protocol

Lars Schneider <larsxschneider@xxxxxxxxx> · Sun, 24 Jul 2016 13:24:29 +0200

On 22 Jul 2016, at 23:39, Junio C Hamano <gitster@xxxxxxxxx> wrote:

> larsxschneider@xxxxxxxxx writes:
> 
>> The first two patches are cleanup patches which are not really necessary
>> for the feature.
> 
> These two looked trivially good.
Thanks!

> I think I can agree with what 3/3 wants to do in principle, but
> 
> * "protocol" is not quite the right word.  The current way to
>   interact with clean and smudge filters can be considered using a
>   different "protocol", that conveys the data and the options via
>   the command line and pipe.  The most distinguishing feature that
>   differentiates the old way and the new style this change allows
>   is that it allows you to have a single instance of the process
>   running that can be reused?
I agree that the name is not ideal. When I started working on the
featured I called it "streaming" but then I read your comment in
$gmane/299863 and realized that this would be a misleading name.
Afterwards I called it "persistent"/"long running" but then I thought 
this term could trick people into thinking that this is some kind of 
daemon. Somehow I want to convey that the filter is persistent for 
one Git invocation only.

What if we would keep the config option "protocol" and make it an "int"? 
Undefined or version "1" would describe the existing clean/smudge 
protocol via command line and pipe. Version "2" would be the new protocol?

> * I am not sure what's the pros-and-cons in forcing people writing
>   a single program that can do both cleaning and smudging.  You
>   cannot have only "smudge" side that uses the long-running process
>   while "clean" side that runs single-shot invocation with this
>   design, which I'd imagine would be a downside.  If you are going
>   to use a long-running process interface for both sides, this
>   design allows you to do it with fewer number of processes, which
>   may be an upside.
We could define the protocol for clean and smudge individually. However,
if you have implemented the more complicated long-running protocol already
for one filter, then you could reuse the code for the other filter, too, as
this protocol is, as far as I can see, always more efficient (assuming you 
have source code access to both filters). Another argument could be that we 
don't define the "required" flag for the filters individually either.

> * The way the serialized access to these long-running processes
>   work in 3/3 would make it harder or impossible to later
>   parallelize conversion?  I am imagining a far future where we
>   would run "git checkout ." using (say) two threads, one
>   responsible for active_cache[0..active_nr/2] and the other
>   responsible for the remainder.
I hope this future is not too far away :-) 
However, I don't think that would be a problem as we could start the
long-running process once for each checkout thread, no?

Thank you,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html