Re: Oh, my -- this is no fun at all

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Oct 2021 16:02:14 -0500
J Leslie Turriff <jlturriff@xxxxxxxx> wrote:

> On 2021-10-13 13:07:13 E. Liddell wrote:
> > On Wed, 13 Oct 2021 16:46:20 +0000
> >
> > That being said, test 9 is a raw grep being performed on an XML file.  This
> > means that it could easily be latching onto something in a comment, because
> > following the full XML spec for determining whether a given line is inside
> > a comment or not using a simple text-matching tool is . . . well, let's say
> > it isn't something I'd want to try, and I deal in regexes a fair amount in
> > my day job. It really needs to be run through a full parser that constructs
> > a DOM tree.
> 
> 	Filter to throw away comments first, then filter for what it should look for.

Correctly throwing away comments isn't as simple as tossing away everything
between a start marker and an end marker, though, because if the comment 
marker is inside a CDATA section, it doesn't actually affect whether or not
the text is a comment.  I suspect a comment marker found between quotes in 
a text-format attribute value doesn't count either, but I'd have to check the spec 
to be sure.  And there may be more quirks that I've forgotten.  (Oh, and you
could *easily* embed the value the grep expression is looking for in the file
without triggering the grep by using CDATA, now that I think about it.)

There's a reason that man perlfaq6 contains the following:

How do I match XML, HTML, or other nasty, ugly things with a regex?
       Do not use regexes. Use a module and forget about the regular expressions.

E. Liddell
____________________________________________________
tde-users mailing list -- users@xxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxx
Web mail archive available at https://mail.trinitydesktop.org/mailman3/hyperkitty/list/users@xxxxxxxxxxxxxxxxxx



[Index of Archives]     [Trinity Devel]     [KDE]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]     [Trinity Desktop Environment]

  Powered by Linux