Re: Consider escaping special characters like 'less' does

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 15, 2017 at 11:37:04PM +0200, Joris Valette wrote:

> 2017-10-15 22:06 GMT+02:00 Jeff King <peff@xxxxxxxx>:
> > Git's diff generation will never do such escaping by default, because it
> > means creating a patch that cannot be applied to get back the original
> > content.
> 
> Indeed this would be a problem. That's where my knowledge of git's
> source code ends, but in that case, can't the output be discriminated
> against the command that was executed?
> Command that outputs an applicable patch -> don't escape
> Command that outputs a diff to see changes -> escape

Speaking in a general sense, people use "git diff" for both purposes,
and we can't necessarily tell which[1]. As a matter of fact, that's a
potential problem with textconv filters as well (which are enabled by
default for git-diff, but not for plumbing like diff-tree, format-patch,
etc). I think the feature isn't widely used enough for people to run
into problems (and they also use it only on things that they don't
_expect_ to be able to make patches for, since they're binaries).

[1] Of course we can come up with heuristics, like "is stdout going to a
    a terminal"? But in such a case we already kick in the pager, and it
    does the exact escaping you're asking for. :)

For a command like add--interactive, it knows which invocations are for
showing to the user and which are for applying (and it already uses
"--color" selectively, for instance). So if there were an "escape this"
option in git's diff generation, we could selectively pass it. But
again, I think the right solution there is not building the escaping
into Git, but just passing it through another filter.

> > It doesn't seem out of the question to me to have an out-of-the-box
> > default for interactive.diffFilter which does some basic escaping (we
> > could even implement it inside the perl script for efficiency).
> 
> Yes I read this thread, but I was left unsatisfied because I would
> like something out-of-the-box.
> Your suggestion might be the best solution then: implement some
> default escaping for interactive.diffFilter.

Alternatively, I suppose we could just always escape in
add--interactive. I'm trying to think of a case where somebody would
really want their diffFilter to see the raw bytes (or vice versa, to
show raw bytes produced by their filter), but I'm having trouble coming
up with one.

I.e., something like this would probably help your case without hurting
anybody:

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 28b325d754..d44e5ea459 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -714,6 +714,16 @@ sub parse_diff {
 		push @{$hunk[-1]{DISPLAY}},
 			(@colored ? $colored[$i] : $diff[$i]);
 	}
+
+	foreach my $hunk (@hunk) {
+		foreach my $line (@{$hunk->{DISPLAY}}) {
+			# all control chars minus newline and ESC (for color)
+			if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) {
+				$hunk->{CONTROLCHARS} = 1;
+			}
+		}
+	}
+
 	return @hunk;
 }
 
@@ -1407,6 +1417,9 @@ sub patch_update_file {
 		if ($hunk[$ix]{TYPE} eq 'hunk') {
 			$other .= ',e';
 		}
+		if ($hunk[$ix]->{CONTROLCHARS}) {
+			print "warning: control characters in hunk have been replaced by '?'\n";
+		}
 		for (@{$hunk[$ix]{DISPLAY}}) {
 			print;
 		}

I can't help but feel this is the tip of a larger iceberg, though. E.g.,
what about characters outside of the terminal's correct encoding? Or
broken UTF-8 characters?

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux