On Sat, Feb 3, 2018 at 2:32 PM, Elijah Newren <newren@xxxxxxxxx> wrote: > On Fri, Feb 2, 2018 at 5:02 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote: >> On Tue, Jan 30, 2018 at 3:25 PM, Elijah Newren <newren@xxxxxxxxx> wrote: >>> + while (*--end_of_new == *--end_of_old && >>> + end_of_old != old_path && >>> + end_of_new != new_path) >>> + ; /* Do nothing; all in the while loop */ >> >> We have to compare manually as we'd want to find >> the first non-equal and there doesn't seem to be a good >> library function for that. >> >> Assuming many repos are UTF8 (including in their paths), >> how does this work with display characters longer than one char? >> It should be fine as we cut at the slash? > > Oh, UTF-8. Ugh. > Can UTF-8 characters, other than '/', have a byte whose value matches > (unsigned char)('/')? If so, then I'll need to figure out how to do > utf-8 character parsing. Anyone have pointers? Well, after digging around for a while, I found this claim on the Wikipedia page for UTF-8: Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" in filenames, "\" in escape sequences, and "%" in printf. So, unless I'm reading something wrong here, I think that means this code is just fine as it is.