Am 08.10.21 um 22:07 schrieb Ævar Arnfjörð Bjarmason: > * I wonder if it isn't time to split up "cpp" into a "c" driver, > e.g. git.git's .gitattributes has "cpp" for *.[ch] files, but as C++ > adds more syntax sugar. > > So e.g. if you use "<=>" after this series we'll tokenize it > differently in *.c files, but it's a C++-only operator, on the other > hand probably nobody cares that much... Yes, it is that: <=> won't appear in a correct C file (outside of comments), so no-one will care. As far as tokenization is concerned, C is a subset of C++. I don't think we need to separate the drivers. > * I found myself back-porting some of your tests (manually mostly), > maybe you disagree, but in cases like 123'123, <=> etc. I'd find it > easier to follow if we first added the test data, and then the > changed behavior. > > Because after all, we're going to change how we highlight existing > data, so testing for that would be informative. Good point. I'll work a bit more on that. > * This pre-dates your much improved tests, but these test files could > really use some test comments, as in: > > /* Now that we're going to understand the "'" character somehow, will any of this change? */ > /* We haven't written code like this since the 1960's ... */ > /* Run & free */ > > I.e. we don't just highlight code the compiler likes to eat, but also > comments. So particularly for smaller tokens that also occur in > natural language like "'" and "&" are we getting expected results? Comments are free text. Anything can happen. There is no such thing as "correct tokenization" in comments. Not interested. Thank you for the review. -- Hannes