Hi Phillip, On Mon, 2 Sep 2019, Phillip Wood wrote: > This is definitely worth fixing, I've got a couple of comments below > > On 02/09/2019 15:01, Matt R via GitGitGadget wrote: > > From: Matt R <mattr94@xxxxxxxxx> I just noticed that the surname is abbreviated. The full name of the author is "Matt Rogers". (Matt, Git uses the Signed-off-by: lines as some sort of legally-binding assurance that you are free to submit these changes under the GPLv2, so your full name is kinda required.) > > The `label` todo command in interactive rebases creates temporary refs > > in the `refs/rewritten/` namespace. These refs are stored as loose refs, > > i.e. as files in `.git/refs/rewritten/`, therefore they have to conform > > with file name limitations on the current filesystem. > > > > This poses a problem in particular on NTFS/FAT, where e.g. the colon > > character is not a valid part of a file name. > > Being picking I'll point out that ':' is not a valid in refs either. True, but that was not the primary concern here ;-) > Looking at > https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I > think only " and | are not allowed on NTFS/FAT but are valid in refs > (see the man page for git check-ref-format for all the details). So > the main limitation is actually what git allows in refs. Right. And this example shows that we really need to be a bit more conservative than just disallowing characters that would not be allowed in refs. I think it is more important to keep in mind the vagaries of various current and future filesystems to justify the change in this patch. > > Let's safeguard against this by replacing not only white-space > > characters by dashes, but all non-alpha-numeric ones. > > > > However, we exempt non-ASCII UTF-8 characters from that, as it should be > > quite possible to reflect branch names such as `↯↯↯` in refs/file names. > > > > Signed-off-by: Matthew Rogers <mattr94@xxxxxxxxx> > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > > --- > > sequencer.c | 12 +++++++++++- > > t/t3430-rebase-merges.sh | 5 +++++ > > 2 files changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/sequencer.c b/sequencer.c > > index 34ebf8ed94..23f4a0876a 100644 > > --- a/sequencer.c > > +++ b/sequencer.c > > @@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct > > pretty_print_context *pp, > > else > > strbuf_addbuf(&label, &oneline); > > + /* > > + * Sanitize labels by replacing non-alpha-numeric characters > > + * (including white-space ones) by dashes, as they might be > > + * illegal in file names (and hence in ref names). > > + * > > + * Note that we retain non-ASCII UTF-8 characters (identified > > + * via the most significant bit). They should be all > > acceptable > > + * in file names. We do not validate the UTF-8 here, that's > > not > > + * the job of this function. > > + */ > > for (p1 = label.buf; *p1; p1++) > > - if (isspace(*p1)) > > + if (!(*p1 & 0x80) && !isalnum(*p1)) > > *(char *)p1 = '-'; > > I'm sightly concerned that this opens the possibility for unexpected effects > if two different labels get sanitized to the same string. I suspect it's > unlikely to happen in practice but doing something like percent encoding > non-alphanumeric characters would avoid the problem entirely. Oh, but we make sure that the labels are unique, via the `label_oid()` function! Otherwise, we would not be able to label more than one merge parent ;-) Ciao, Dscho