Re: Riddle me this: grep / regx experts

"R. G. Newbury" <newbury@xxxxxxxxxxxx> · Fri, 2 Feb 2018 12:32:22 -0500

On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote:
A bug in regx handling???

I am cleaning up some html code
.....
# grep -h '[0-9]*s[0-9]*">' temp
>> Returns the example line with the 's[0-9]">' highlighted.

Can anyone explain what is happening?. This isn't politics so the group
[0-9] should not equal [0-9"#]. Or even [0-9\"\#].

.
Fri, 2 Feb 2018 10:14:37 -0600 From: Chris Adams <linux@xxxxxxxxxxx> 

A * in a regex is "0 or more of the previous", so basically you are just
matching 's[0-9]*">' (because there will always be at least 0 of the
[0-9] part at the start).

If you really mean "1 or more", you can use an extended regex (the -E
argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'.

Fri, 02 Feb 2018 16:15:37 +0000 From: Patrick O'Callaghan 
In grep, * matches any number of instances, including 0. You want to
use + rather than * to guarantee at least one digit.

Date: Fri, 2 Feb 2018 11:26:02 -0500 > From: Jon LaBadie<jonfu@xxxxxxxxxx>

You are misunderstanding the "*".  It means any sequence of the
associated character including a ZERO length sequence.

So [0-9]*s matches "s (actually just the s) as is is a zero length
sequence of digits followed by an s.  When you grep for [0-9]s, there
must be at least one digit before the s (but any extra digits are not
part of the match).  Sometimes the sequence [0-9][0-9]*s is useful to
say "one or more digits before the s".

jl
Thanks to all for the quick responses. I *tried* to RTFM but that was 
not clear, even on a re-read.  I took [0-9]* as multiple instances of 
[0-9] but NOT zero instances..

Geoff
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx