[PATCH v11 00/14] Git filter protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Lars Schneider <larsxschneider@xxxxxxxxx>

The goal of this series is to avoid launching a new clean/smudge filter
process for each file that is filtered.

A short summary about v1 to v5 can be found here:
https://git.github.io/rev_news/2016/08/17/edition-18/

This series is also published on web:
https://github.com/larsxschneider/git/pull/15

Patches 1 and 2 are cleanups and not strictly necessary for the series.
Patches 3 to 12 are required preparation. Patch 13 is the main patch.
Patch 14 adds an example how to use the Git filter protocol in contrib.

Thanks a lot to
 Jakub, Junio, Ramsay, Dscho, Torsten and Peff
for very helpful reviews,
Lars

## Changes since v10
  * clarify documentation (Jakub)
  * change "<capability>=true" to "capability=<capability>" in protocol (Jakub)
  * make stop_multi_file_filter() static (Ramsay)
  * add flush packet after version negotiation for consistency (Jakub)
  * fix pid_t translation error on Windows (Dscho)
  * fix Unquoted references in tests (Junio)
  * remove smudge invocation count in tests (Junio)


Lars Schneider (14):
  convert: quote filter names in error messages
  convert: modernize tests
  run-command: move check_pipe() from write_or_die to run_command
  run-command: add clean_on_exit_handler
  pkt-line: rename packet_write() to packet_write_fmt()
  pkt-line: extract set_packet_header()
  pkt-line: add packet_write_fmt_gently()
  pkt-line: add packet_flush_gently()
  pkt-line: add packet_write_gently()
  pkt-line: add functions to read/write flush terminated packet streams
  convert: make apply_filter() adhere to standard Git error handling
  convert: prepare filter.<driver>.process option
  convert: add filter.<driver>.process option
  contrib/long-running-filter: add long running filter example

 Documentation/gitattributes.txt        | 157 ++++++++++-
 builtin/archive.c                      |   4 +-
 builtin/receive-pack.c                 |   4 +-
 builtin/remote-ext.c                   |   4 +-
 builtin/upload-archive.c               |   4 +-
 connect.c                              |   2 +-
 contrib/long-running-filter/example.pl | 128 +++++++++
 convert.c                              | 375 +++++++++++++++++++++---
 daemon.c                               |   2 +-
 http-backend.c                         |   2 +-
 pkt-line.c                             | 152 +++++++++-
 pkt-line.h                             |  12 +-
 run-command.c                          |  39 ++-
 run-command.h                          |   4 +-
 shallow.c                              |   2 +-
 t/t0021-conversion.sh                  | 502 ++++++++++++++++++++++++++++++---
 t/t0021/rot13-filter.pl                | 192 +++++++++++++
 upload-pack.c                          |  30 +-
 write_or_die.c                         |  13 -
 19 files changed, 1495 insertions(+), 133 deletions(-)
 create mode 100755 contrib/long-running-filter/example.pl
 create mode 100755 t/t0021/rot13-filter.pl



## Interdiff (v10..v11)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index a182ef2..976243a 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -297,9 +297,11 @@ upon checkin. By default these commands process only a single
 blob and terminate. If a long running `process` filter is used
 in place of `clean` and/or `smudge` filters, then Git can process
 all blobs with a single filter command invocation for the entire
-life of a single Git command, for example `git add --all`.  See
-section below for the description of the protocol used to
-communicate with a `process` filter.
+life of a single Git command, for example `git add --all`. If a
+long running `process` filter is configured then it always takes
+precedence over a configured single blob filter. See section
+below for the description of the protocol used to communicate with
+a `process` filter.

 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -393,10 +395,10 @@ text and therefore are terminated by a LF.

 Git starts the filter when it encounters the first file
 that needs to be cleaned or smudged. After the filter started
-Git sends a welcome message ("git-filter-client"), a list of
-supported protocol version numbers, and a flush packet. Git expects
-to read a welcome response message ("git-filter-server") and exactly
-one protocol version number from the previously sent list. All further
+Git sends a welcome message ("git-filter-client"), a list of supported
+protocol version numbers, and a flush packet. Git expects to read a welcome
+response message ("git-filter-server"), exactly one protocol version number
+from the previously sent list, and a flush packet. All further
 communication will be based on the selected version. The remaining
 protocol description below documents "version=2". Please note that
 "version=42" in the example below does not exist and is only there
@@ -414,12 +416,13 @@ packet:          git> version=42
 packet:          git> 0000
 packet:          git< git-filter-server
 packet:          git< version=2
-packet:          git> clean=true
-packet:          git> smudge=true
-packet:          git> not-yet-invented=true
+packet:          git< 0000
+packet:          git> capability=clean
+packet:          git> capability=smudge
+packet:          git> capability=not-yet-invented
 packet:          git> 0000
-packet:          git< clean=true
-packet:          git< smudge=true
+packet:          git< capability=clean
+packet:          git< capability=smudge
 packet:          git< 0000
 ------------------------
 Supported filter capabilities in version 2 are "clean" and
@@ -428,7 +431,7 @@ Supported filter capabilities in version 2 are "clean" and
 Afterwards Git sends a list of "key=value" pairs terminated with
 a flush packet. The list will contain at least the filter command
 (based on the supported capabilities) and the pathname of the file
-to filter relative to the repository root. Right after these packets
+to filter relative to the repository root. Right after the flush packet
 Git sends the content split in zero or more pkt-line packets and a
 flush packet to terminate content. Please note, that the filter
 must not send any response before it received the content and the
@@ -447,7 +450,10 @@ problems then the list must contain a "success" status. Right after
 these packets the filter is expected to send the content in zero
 or more pkt-line packets and a flush packet at the end. Finally, a
 second list of "key=value" pairs terminated with a flush packet
-is expected. The filter can change the status in the second list.
+is expected. The filter can change the status in the second list
+or keep the status as is with an empty list. Please note that the
+empty list must be terminated with a flush packet regardless.
+
 ------------------------
 packet:          git< status=success
 packet:          git< 0000
@@ -457,7 +463,7 @@ packet:          git< 0000  # empty list, keep "status=success" unchanged!
 ------------------------

 If the result content is empty then the filter is expected to respond
-with a "success" status and an empty list.
+with a "success" status and a flush packet to signal the empty content.
 ------------------------
 packet:          git< status=success
 packet:          git< 0000
@@ -466,9 +472,7 @@ packet:          git< 0000  # empty list, keep "status=success" unchanged!
 ------------------------

 In case the filter cannot or does not want to process the content,
-it is expected to respond with an "error" status. Depending on the
-`filter.<driver>.required` flag Git will interpret that as error
-but it will not stop or restart the filter process.
+it is expected to respond with an "error" status.
 ------------------------
 packet:          git< status=error
 packet:          git< 0000
@@ -476,9 +480,7 @@ packet:          git< 0000

 If the filter experiences an error during processing, then it can
 send the status "error" after the content was (partially or
-completely) sent. Depending on the `filter.<driver>.required` flag
-Git will interpret that as error but it will not stop or restart the
-filter process.
+completely) sent.
 ------------------------
 packet:          git< status=success
 packet:          git< 0000
@@ -488,31 +490,31 @@ packet:          git< status=error
 packet:          git< 0000
 ------------------------

-If the filter dies during the communication or does not adhere to
-the protocol then Git will stop the filter process and restart it
-with the next file that needs to be processed. Depending on the
-`filter.<driver>.required` flag Git will interpret that as error.
-
-The error handling for all cases above mimic the behavior of
-the `filter.<driver>.clean` / `filter.<driver>.smudge` error
-handling.
-
 In case the filter cannot or does not want to process the content
 as well as any future content for the lifetime of the Git process,
-it is expected to respond with an "abort" status at any point in
-the protocol. Depending on the `filter.<driver>.required` flag Git
-will interpret that as error for the content as well as any future
-content for the lifetime of the Git process but it will not stop or
-restart the filter process.
+then it is expected to respond with an "abort" status at any point
+in the protocol.
 ------------------------
 packet:          git< status=abort
 packet:          git< 0000
 ------------------------

+Git neither stops nor restarts the filter process in case the
+"error"/"abort" status is set. However, Git sets its exit code
+according to the `filter.<driver>.required` flag, mimicking the
+behavior of the `filter.<driver>.clean` / `filter.<driver>.smudge`
+mechanism.
+
+If the filter dies during the communication or does not adhere to
+the protocol then Git will stop the filter process and restart it
+with the next file that needs to be processed. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error.
+
 After the filter has processed a blob it is expected to wait for
 the next "key=value" list containing a command. Git will close
 the command pipe on exit. The filter is expected to detect EOF
-and exit gracefully on its own.
+and exit gracefully on its own. Git will wait until the filter
+process has stopped.

 A long running filter demo implementation can be found in
 `contrib/long-running-filter/example.pl` located in the Git
@@ -520,10 +522,6 @@ core repository. If you develop your own long running filter
 process then the `GIT_TRACE_PACKET` environment variables can be
 very helpful for debugging (see linkgit:git[1]).

-If a `filter.<driver>.process` command is configured then it
-always takes precedence over a configured `filter.<driver>.clean`
-or `filter.<driver>.smudge` command.
-
 Please note that you cannot use an existing `filter.<driver>.clean`
 or `filter.<driver>.smudge` command with `filter.<driver>.process`
 because the former two use a different inter process communication
diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
index f4102d2..3945705 100755
--- a/contrib/long-running-filter/example.pl
+++ b/contrib/long-running-filter/example.pl
@@ -70,13 +70,14 @@ sub packet_flush {

 packet_txt_write("git-filter-server");
 packet_txt_write("version=2");
+packet_flush();

-( packet_txt_read() eq ( 0, "clean=true" ) )  || die "bad capability";
-( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
 ( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";

-packet_txt_write("clean=true");
-packet_txt_write("smudge=true");
+packet_txt_write("capability=clean");
+packet_txt_write("capability=smudge");
 packet_flush();

 while (1) {
diff --git a/convert.c b/convert.c
index 1d89632..9d2aa68 100644
--- a/convert.c
+++ b/convert.c
@@ -567,7 +567,7 @@ static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *
 	free(entry);
 }

-void stop_multi_file_filter(struct child_process *process)
+static void stop_multi_file_filter(struct child_process *process)
 {
 	sigchain_push(SIGPIPE, SIG_IGN);
 	/* Closing the pipe signals the filter to initiate a shutdown. */
@@ -622,8 +622,11 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	err = strcmp(packet_read_line(process->out, NULL), "version=2");
 	if (err)
 		goto done;
+	err = packet_read_line(process->out, NULL) != NULL;
+	if (err)
+		goto done;

-	err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
+	err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);

 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
@@ -631,10 +634,10 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 			break;
 		string_list_split_in_place(&cap_list, cap_buf, '=', 1);

-		if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
+		if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, "capability"))
 			continue;

-		cap_name = cap_list.items[0].string;
+		cap_name = cap_list.items[1].string;
 		if (!strcmp(cap_name, "clean")) {
 			entry->supported_capabilities |= CAP_CLEAN;
 		} else if (!strcmp(cap_name, "smudge")) {
diff --git a/run-command.c b/run-command.c
index e5fd6ff..ca905a9 100644
--- a/run-command.c
+++ b/run-command.c
@@ -36,7 +36,10 @@ static void cleanup_children(int sig, int in_signal)
 		if (p->process && !in_signal) {
 			struct child_process *process = p->process;
 			if (process->clean_on_exit_handler) {
-				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
+				trace_printf(
+					"trace: run_command: running exit handler for pid %"
+					PRIuMAX, (uintmax_t)p->pid
+				);
 				process->clean_on_exit_handler(process);
 			}
 		}
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 9f892c0..a20b9f5 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -31,38 +31,35 @@ filter_git () {
 	rm -f git-stderr.log
 }

-# Count unique lines in two files and compare them.
-test_cmp_count () {
-	for FILE in $@
-	do
-		sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
-		cat $FILE.tmp >$FILE
-	done &&
-	test_cmp $@
-}
-
-# Count unique lines except clean invocations in two files and compare
-# them. Clean invocations are not counted because their number can vary.
+# Compare two files and ensure that `clean` and `smudge` respectively are
+# called at least once if specified in the `expect` file. The actual
+# invocation count is not relevant because their number can vary.
 # c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
-test_cmp_count_except_clean () {
-	for FILE in $@
+test_cmp_count () {
+	expect=$1
+	actual=$2
+	for FILE in "$expect" "$actual"
 	do
-		sort $FILE | uniq -c | sed "s/^[ ]*//" |
-			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
-		cat $FILE.tmp >$FILE
+		sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
+			sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
+			sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
+		mv "$FILE.tmp" "$FILE"
 	done &&
-	test_cmp $@
+	test_cmp "$expect" "$actual"
 }

-# Compare two files but exclude clean invocations because they can vary.
+# Compare two files but exclude all `clean` invocations because Git can
+# call `clean` zero or more times.
 # c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
 test_cmp_exclude_clean () {
-	for FILE in $@
+	expect=$1
+	actual=$2
+	for FILE in "$expect" "$actual"
 	do
-		grep -v "IN: clean" $FILE >$FILE.tmp
-		cat $FILE.tmp >$FILE
+		grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
+		mv "$FILE.tmp" "$FILE"
 	done &&
-	test_cmp $@
+	test_cmp "$expect" "$actual"
 }

 # Check that the contents of two files are equal and that their rot13 version
@@ -395,7 +392,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_count_except_clean expected.log rot13-filter.log &&
+		test_cmp_count expected.log rot13-filter.log &&

 		rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x.r" &&

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 1a6959c..ae4c50f 100755
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -95,13 +95,14 @@ $debug->flush();

 packet_txt_write("git-filter-server");
 packet_txt_write("version=2");
+packet_flush();

-( packet_txt_read() eq ( 0, "clean=true" ) )  || die "bad capability";
-( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
 ( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";

 foreach (@capabilities) {
-	packet_txt_write( $_ . "=true" );
+	packet_txt_write( "capability=" . $_ );
 }
 packet_flush();
 print $debug "init handshake complete\n";

--
2.10.0




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]