On 28-Mar-2021, at 23:36, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Atharva Raykar <raykar.ath@xxxxxxxxx> writes: > >>>> + "(\\.|[^][)(\\}\\{ ])+"), >>> >>> One or more "dot or anything other than SP or parentheses"? But >>> a dot "." is neither a space or any {bra-ce} letter, so would the >>> above be equivalent to >>> >>> "[^][()\\{\\} \t]+" >>> >>> I wonder... >> >> A backslash is allowed in scheme identifiers, and I erroneously thought that >> the first part handles the case for identifiers such as `component\new` or >> `\"id-with-quotes\"`. (I tested it with a regex engine that behaves differently >> than the one git is using, my bad.) > > Ah, perhaps you didn't have enough backslashes. A half of the > doubled one before the dot is eaten by the C compiler, so the regexp > engine is seeing only a single backslash before the dot, which means > "literally a single dot". If you meant "literally a single > backslash, followed by any single char", you probably would write 4 > backslashes and a dot---half of the backslashes would be eaten by > the compiler, so you'd be passing two backslashes and a dot, which > is probably what you meant. > > Having said that, two further points. > > - the "anything but whitespaces and various forms of parentheses" > set would include backslash, so 'component\new' would be taken as > a single word with "[^][()\\{\\} \t]+", wouldn't it? > > - how common is the use of backslashes in identifiers? I am trying > to see if the additional complexity needed to support them is > worth the benefit. I have refined the regex, and now it is much simpler and does all of what I want it to: "([^][)(}{[:space:]])+" I did not have to escape the various parentheses, so I avoided the need to handle backslashes separately. The "\\t" was causing problems as well because it took it as a '\' followed by a 't' (Thanks to j416 on #git-devel for helping me out on this). >> Yes, this is exactly what I was trying to express. All words should be >> delimited by either whitespace or a parenthesis, and all other special >> characters should be accepted as part of the word. > > That sentence after "All words should be..." would be a good comment > to replace what you wrote in the original, then ;-). Yes, that should make it a lot more clear.