Am 30.05.19 um 20:59 schrieb Ævar Arnfjörð Bjarmason: > > On Thu, May 30 2019, Johannes Sixt wrote: > >> - Do not enforce (but assume) syntactic correctness of language >> constructs that go into hunk headers: we only want to ensure that >> the keywords actually are words and not just the initial part of >> some identifier. >> >> - In the word regex, match numbers only when they begin with a digit, >> but then be liberal in what follows, assuming that the text that is >> matched is syntactially correct. > > I don't know if this is possible for Rust (but very much suspect so...), > but I think that in general we should aim to be more forgiving than not > with these patterns. The C/C++ pattern is actually very forgiving in the hunk header pattern: It takes every line that begins with an un-indented letter. That works very well in in C because C does not have nested functions and it is typical that the function definition lines are not indented. But that breaks down with C++: indented function definitions are very common; they happen inside class and namespace definitions. Such functions are not picked up, and we live with that so far (at least, I do). > Because, as the history of userdiff.c shows, new keywords get introduced > into these languages, and old git versions survive for a long time. If > the syntax is otherwise fairly regular perhaps we don't need to hardcode > the list of existing keywords? We are talking about (1) hunk header lines (not something really important) and (2) programming languages: new keywords don't pop up every month. Granted, inventing new languages is en vogue these days. But really, I mean, WTH? Having available keywords to recognize hunk header candidates helps a lot. I thought long about a possible pattern for C++, but I gave up, because the language is so rich and there are no suitable keywords. -- Hannes