Re: [PATCH v6 12/13] convert: add filter.<driver>.process option

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 29 Aug 2016 15:21:15 -0700

larsxschneider@xxxxxxxxx writes:

> +In case the filter cannot or does not want to process the content,
> +it is expected to respond with an "error" status. Depending on the
> +`filter.<driver>.required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +------------------------
> +packet:          git< status=error\n
> +packet:          git< 0000
> +------------------------
> +
> +In case the filter cannot or does not want to process the content
> +as well as any future content for the lifetime of the Git process,
> +it is expected to respond with an "error-all" status. Depending on
> +the `filter.<driver>.required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +------------------------
> +packet:          git< status=error-all\n
> +packet:          git< 0000
> +------------------------

This part of the document is well-written to help filter-writers.

One thing that was unclear from the above to me, when read as a
potential filter-writer, is when I am supposed to exit(2).  After I
tell Git with error-all (I would have called it "abort", but that's
OK) that I desire no further communication, am I free to go?  Or do
I wait until Git somehow disconnects (perhaps by closing the packet
stream I have been reading)?

> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it
> +with the next file that needs to be processed.

Hmph, is there a reason not to retry a half-converted-and-failed
blob with the fresh process?  Note that this is not "you must do it
that way", and it is not even "I think doing so may be a better
idea".  I merely want to know the reason behind this decision.

> +After the filter has processed a blob it is expected to wait for
> +the next "key=value" list containing a command. When the Git process
> +terminates, it will send a kill signal to the filter in that stage.

The "kill" may not be very nice.  As Git side _knows_ that the
filter is waiting for the next command, having an explicit
"shutdown" command would give the filter a chance to implement a
clean exit--it may have some housekeeping tasks it wants to perform
once it is done.  The "explicit shutdown" could just be "the pipe
gets closed", so from the implementation point of view there may not
be anything you need to further add to this patch (after all, when
we exit, the pipes to them would be closed), but the shutdown
protocol and the expectation on the behaviour of filter processes
would need to be documented.

> +If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
> +is configured then these commands always take precedence over
> +a configured `filter.<driver>.process` command.

It may make more sense to give precedence to the .process (which is
a late-comer) if defined, ignoring .clean and .smudge, than the
other way around.

> +Please note that you cannot use an existing `filter.<driver>.clean`
> +or `filter.<driver>.smudge` command with `filter.<driver>.process`
> +because the former two use a different inter process communication
> +protocol than the latter one.

Would it be a useful sample program we can ship in contrib/ if you
created a "filter adapter" that reads these two configuration
variables and act as a filter.<driver>.process?

During an imaginary session of "git add .", I think I found where
you start THE filter process upon the first path that needs to be
filtered with one for the configured <driver>, and I think the same
place is where you reuse THE filter process, but I am not sure where
you are cleaning up by killing the filter once all paths are added.
Wouldn't you need some hooks at strategic places after such bulk
operation to tell the multi-file-filter machinery to walk all the
entries in cmd_process_map and tell the remaining filter processes
that they have no more tasks, or something?  Are you relying on
these processes to exit upon a read failure after we exit and the
pipe going to the filter is severed?

Thanks.