Re: Regular expression to find from start of string to first space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, August 8, 2006 4:21 am, Dave M G wrote:
> Shouldn't this regular expression select everything from the start of
> the string to the first space character:
>
> $firstWord = preg_match('#^*(.*) #iU', $word);
>
> It doesn't, so clearly I'm wrong, but here's why I thought it would:
>
> The enclosing has marks, "#", I *think* just encloses the expression.
> I
> was told to use them before, but I can't find them here:
> http://jp2.php.net/manual/en/reference.pcre.pattern.syntax.php

The # can be any character you want, that's convenient.
Convenient generally means "not likely to be needed within the pattern"
The #, or whatever you choose, marks beginning and end of the pattern.
I believe that there are also some special ones like < and > that you
can use.

> The caret, "^", says to start at the beginning of the line.

Yes.

> The first asterix, "*" after the caret says to use any starting
> character.

No.

* means "0 or more of the preceding thingie"

* by itself, with nothing preceding it...

I don't even know WHAT that means.

Maybe it's just a *

sometimes a star is just a star :-)

Or maybe it applies to the "beginning of string" anchor, so it would
match "0 or more" newlines if you were using 's' at the end...

But the whole point of ^ is to require a start, and the whole point of
* is to not require anything at all, so that would be a oxymoron.

> The space just before the second "#" is the closing character of my
> search.

No.

The # is the closing of your pattern, because you CHOSE # as the
beginning of your pattern.

These are all the same:
#^(.*)\s#
|^(.*)\s|
/^(.*)\s/
Z^(.*)\sZ

Z is probably not such a good idea as you may need a Z in your pattern.

It may even be illegal and you can only use non-alphanumeric for the
delimiter, actually.

> The "(.*)" in the middle says to take anything in between the
> beginning
> of the line and the space.
>
> "iU" says, "be case insensitive, and don't be greedy".

Yes.

Without that, the .* would match spaces as well as non-spaces, and
only the LAST whitespace would "count" for the \s bit.

> So, it should start at the beginning of the line and get everything up
> to the first space. But it doesn't work.

I generally find .* to be problematic, and do more like:

#^([^\s])*#sU

This way, I'm saying to anchor at the beginning, and then look for
NON-whitespace, and use * to get as many as possible.

The ^ *inside* the [] means "not"

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux