Re: [PATCH 1/1] rebase -r: let `label` generate safer labels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matt

This is definitely worth fixing, I've got a couple of comments below

On 02/09/2019 15:01, Matt R via GitGitGadget wrote:
From: Matt R <mattr94@xxxxxxxxx>

The `label` todo command in interactive rebases creates temporary refs
in the `refs/rewritten/` namespace. These refs are stored as loose refs,
i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
with file name limitations on the current filesystem.

This poses a problem in particular on NTFS/FAT, where e.g. the colon
character is not a valid part of a file name.

Being picking I'll point out that ':' is not a valid in refs either. Looking at https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I think only " and | are not allowed on NTFS/FAT but are valid in refs (see the man page for git check-ref-format for all the details). So the main limitation is actually what git allows in refs.

Let's safeguard against this by replacing not only white-space
characters by dashes, but all non-alpha-numeric ones.

However, we exempt non-ASCII UTF-8 characters from that, as it should be
quite possible to reflect branch names such as `↯↯↯` in refs/file names.

Signed-off-by: Matthew Rogers <mattr94@xxxxxxxxx>
Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
---
  sequencer.c              | 12 +++++++++++-
  t/t3430-rebase-merges.sh |  5 +++++
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 34ebf8ed94..23f4a0876a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct pretty_print_context *pp,
  		else
  			strbuf_addbuf(&label, &oneline);
+ /*
+		 * Sanitize labels by replacing non-alpha-numeric characters
+		 * (including white-space ones) by dashes, as they might be
+		 * illegal in file names (and hence in ref names).
+		 *
+		 * Note that we retain non-ASCII UTF-8 characters (identified
+		 * via the most significant bit). They should be all acceptable
+		 * in file names. We do not validate the UTF-8 here, that's not
+		 * the job of this function.
+		 */
  		for (p1 = label.buf; *p1; p1++)
-			if (isspace(*p1))
+			if (!(*p1 & 0x80) && !isalnum(*p1))
  				*(char *)p1 = '-';

I'm sightly concerned that this opens the possibility for unexpected effects if two different labels get sanitized to the same string. I suspect it's unlikely to happen in practice but doing something like percent encoding non-alphanumeric characters would avoid the problem entirely.

Best Wishes

Phillip

  		strbuf_reset(&buf);
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 7b6c4847ad..737396f944 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -441,4 +441,9 @@ test_expect_success '--continue after resolving conflicts after a merge' '
  	test_path_is_missing .git/MERGE_HEAD
  '
+test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
+	git checkout -b colon-in-label E &&
+	git merge -m "colon: this should work" G &&
+	git rebase --rebase-merges --force-rebase E
+'
  test_done




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux