Hi Junio, On Mon, 2 Sep 2019, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > > >> for (p1 = label.buf; *p1; p1++) > >> - if (isspace(*p1)) > >> + if (!(*p1 & 0x80) && !isalnum(*p1)) > >> *(char *)p1 = '-'; > > > > I'm sightly concerned that this opens the possibility for unexpected > > effects if two different labels get sanitized to the same string. I > > suspect it's unlikely to happen in practice but doing something like > > percent encoding non-alphanumeric characters would avoid the problem > > entirely. > > I'd rather see 'x' used instead of '-' (double-or-more dashes and > leading dash in refnames may currently be allowed but double-or-more > exes and leading ex would be much more likely to stay valid) if we > just want to redact invalid characters. Hmm. Let's take a concrete example from the VFS for Git fork: Merge pull request #160: trace2:gvfs:experiment Add experimental regions and data events to help diagnose checkout and reset perf problems (Yes, we have some quite verbose merge commits, with very, very long onelines. Not a good practice, we stopped doing it, but it was well within what Git allows.) And now use dashes to encode all white-space: Merge-pull-request-#160:-trace2:gvfs:experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems Pretty long, but looks okay. Of course, it does not work, because colons. So here is the label with Matt's patch: Merge-pull-request--160--trace2-gvfs-experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems And here is the label with your proposed xs. Mergexpullxrequestxx160xxtrace2xgvfsxexperimentxAddxexperimentalxregionsxandxdataxeventsxtoxhelpxdiagnosexcheckoutxandxresetxperfxproblems I cannot speak for you, of course, but I can speak for myself: this is not only way too reminiscent of xoxoxothxbye, but it is also really, totally unreadable. If you care deeply about double dashes and leading dashes, how about this instead? char *from, *to; for (from = to = label.buf; *from; from++) if ((*from & 0x80) || isalnum(*from)) *(to++) = *from; /* avoid leading dash and double-dashes */ else if (to != label.buf && to[-1] != '-') *(to++) = '-'; strbuf_setlen(&label, to - label.buf); That would result in Merge-pull-request-160-trace2-gvfs-experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems > I see there are "lets make sure it is unique by suffixing "-%d" in > other codepaths; would that help if this piece of code yields a > label that is not unique? I'm way ahead of you. The sequencer already goes out of its way to guarantee the uniqueness of the labels (it's part of the design, as applied in 1644c73c6d4 (rebase-helper --make-script: introduce a flag to rebase merges, 2018-04-25)). The patch you are looking at in this thread is only concerned about the initial phase, `label_oid()` does a lot more. Not only does it make the label unique (case-insensitively!), it also prevents it from looking like a full 40-hex digit SHA-1, so that we can guarantee that unique abbreviations of commit hashes will work as labels, too. Ciao, Dscho