Hi Matt
This is definitely worth fixing, I've got a couple of comments below
On 02/09/2019 15:01, Matt R via GitGitGadget wrote:
From: Matt R <mattr94@xxxxxxxxx>
The `label` todo command in interactive rebases creates temporary refs
in the `refs/rewritten/` namespace. These refs are stored as loose refs,
i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
with file name limitations on the current filesystem.
This poses a problem in particular on NTFS/FAT, where e.g. the colon
character is not a valid part of a file name.
Being picking I'll point out that ':' is not a valid in refs either.
Looking at
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I
think only " and | are not allowed on NTFS/FAT but are valid in refs
(see the man page for git check-ref-format for all the details). So the
main limitation is actually what git allows in refs.
Let's safeguard against this by replacing not only white-space
characters by dashes, but all non-alpha-numeric ones.
However, we exempt non-ASCII UTF-8 characters from that, as it should be
quite possible to reflect branch names such as `↯↯↯` in refs/file names.
Signed-off-by: Matthew Rogers <mattr94@xxxxxxxxx>
Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
---
sequencer.c | 12 +++++++++++-
t/t3430-rebase-merges.sh | 5 +++++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/sequencer.c b/sequencer.c
index 34ebf8ed94..23f4a0876a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct pretty_print_context *pp,
else
strbuf_addbuf(&label, &oneline);
+ /*
+ * Sanitize labels by replacing non-alpha-numeric characters
+ * (including white-space ones) by dashes, as they might be
+ * illegal in file names (and hence in ref names).
+ *
+ * Note that we retain non-ASCII UTF-8 characters (identified
+ * via the most significant bit). They should be all acceptable
+ * in file names. We do not validate the UTF-8 here, that's not
+ * the job of this function.
+ */
for (p1 = label.buf; *p1; p1++)
- if (isspace(*p1))
+ if (!(*p1 & 0x80) && !isalnum(*p1))
*(char *)p1 = '-';
I'm sightly concerned that this opens the possibility for unexpected
effects if two different labels get sanitized to the same string. I
suspect it's unlikely to happen in practice but doing something like
percent encoding non-alphanumeric characters would avoid the problem
entirely.
Best Wishes
Phillip
strbuf_reset(&buf);
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 7b6c4847ad..737396f944 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -441,4 +441,9 @@ test_expect_success '--continue after resolving conflicts after a merge' '
test_path_is_missing .git/MERGE_HEAD
'
+test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
+ git checkout -b colon-in-label E &&
+ git merge -m "colon: this should work" G &&
+ git rebase --rebase-merges --force-rebase E
+'
test_done