Re: Riddle me this: grep / regx experts

Jon LaBadie <jonfu@xxxxxxxxxx> · Fri, 2 Feb 2018 11:26:02 -0500

On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote:
> A bug in regx handling???
> 
> I am cleaning up some html code, using sed to standardize the formatting. I
> was searching for specific instances of code to amend using grep.
> I was looking for instances like  <a name="s1s1">
> Example text in a file: ( here named, quite originally, temp )
> <p class="section-f"></a><a name="s8"></a>8.</b></a>
> 
> And # grep -h '[0-9]s[0-9]*">' temp
> Returns nothing  (which is the expected result: there are no [0-9]s[0-9}">
> instances.
> 
> BUT!!!
> # grep -h '[0-9]*s[0-9]*">' temp
> Returns the example line with the 's[0-9]">' highlighted.
> 
> Note that the character before the 's' is either " or #
> 
> Can anyone explain what is happening?. This isn't politics so the group
> [0-9] should not equal [0-9"#]. Or even [0-9\"\#].

You are misunderstanding the "*".  It means any sequence of the
associated character including a ZERO length sequence.

So [0-9]*s matches "s (actually just the s) as is is a zero length
sequence of digits followed by an s.  When you grep for [0-9]s, there
must be at least one digit before the s (but any extra digits are not
part of the match).  Sometimes the sequence [0-9][0-9]*s is useful to
say "one or more digits before the s".

jl
-- 
Jon H. LaBadie                  jonfu@xxxxxxxxxx
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx