On Wed, 13 Oct 2021 16:02:14 -0500 J Leslie Turriff <jlturriff@xxxxxxxx> wrote: > On 2021-10-13 13:07:13 E. Liddell wrote: > > On Wed, 13 Oct 2021 16:46:20 +0000 > > > > That being said, test 9 is a raw grep being performed on an XML file. This > > means that it could easily be latching onto something in a comment, because > > following the full XML spec for determining whether a given line is inside > > a comment or not using a simple text-matching tool is . . . well, let's say > > it isn't something I'd want to try, and I deal in regexes a fair amount in > > my day job. It really needs to be run through a full parser that constructs > > a DOM tree. > > Filter to throw away comments first, then filter for what it should look for. Correctly throwing away comments isn't as simple as tossing away everything between a start marker and an end marker, though, because if the comment marker is inside a CDATA section, it doesn't actually affect whether or not the text is a comment. I suspect a comment marker found between quotes in a text-format attribute value doesn't count either, but I'd have to check the spec to be sure. And there may be more quirks that I've forgotten. (Oh, and you could *easily* embed the value the grep expression is looking for in the file without triggering the grep by using CDATA, now that I think about it.) There's a reason that man perlfaq6 contains the following: How do I match XML, HTML, or other nasty, ugly things with a regex? Do not use regexes. Use a module and forget about the regular expressions. E. Liddell ____________________________________________________ tde-users mailing list -- users@xxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxx Web mail archive available at https://mail.trinitydesktop.org/mailman3/hyperkitty/list/users@xxxxxxxxxxxxxxxxxx