William Duclot <william.duclot@xxxxxxxxxxxxxxxxxxxxxxx> writes: > CSS is widely used, motivating it being included as a built-in pattern. > > It must be noted that the word_regex for CSS (i.e. the regex defining > what is a word in the language) does not consider '.' and '#' characters > (in CSS selectors) to be part of the word. This behavior is documented > by the test t/t4018/css-rule. > The logic behind this behavior is the following: identifiers in CSS > selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#' > character are not part of the identifier, but an indicator of the nature > of the identifier in HTML/XML (class or id). Diffing ".class1" and > ".class2" must show that the class name is changed, but we still are > selecting a class. > > Signed-off-by: William Duclot <william.duclot@xxxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Matthieu Moy <matthieu.moy@xxxxxxxxxxxxxxx> > --- > changes since v1: > Fix a typo in the word_regex ("A-F" => "A-Z"). > Clearer comment about ISO 10646 characters. It is not a big deal for a small single-patch topic like this, but it often is hard to reviewers if you do not respond to comments you received and instead just send a new version of the patch with "changes since..." comment. Please make it a habit to do both. I can see in the above "changes since v1" comment, you took the A-F/A-Z thing, but I cannot tell if you thought about PATTERNS vs IPATTERN and rejected IPATTERN with a good reason or if you simply missed it when reading the review comments you received, for example. Three remaining issues are: - Have you considered using IPATTERN()? PATTERNS() that defaults case sensitive match is suitable for real languages with fixed case keywords, but the pattern you are defining for CSS does not do anything special for any set of fixed-case built-in keywords, and appears to be better served by IPATTERN(). - In our codebase, we format multi-line comments in a particular way, namely /* * A multi-line comment begins with slash asterisk * on its own line, and its closing asterisk slash * also is on its own line. */ - Try not to write overlong lines. If your expression needs to become long and there is no good place to fold lines, that is one thing, but an overlong comment is unexcuable, as you can fold lines anywhere between words. Thanks. > diff --git a/userdiff.c b/userdiff.c > index 6bf2505..9273969 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -148,6 +148,14 @@ PATTERNS("csharp", > "[a-zA-Z_][a-zA-Z0-9_]*" > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"), > +PATTERNS("css", > + "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$", > + /* -- */ > + /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher, > + * but they are not handled with this regex. */ > + "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */ > + "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */ > +), > { "default", NULL, -1, { NULL, 0 } }, > }; > #undef PATTERNS -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html