Re: Dividing, and keeping, text from the first space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2006-08-09 at 19:07 +0900, Dave M G wrote:
> Robert,
> 
> Thank you for your quick response and helpful advice.
> >> Use preg_match() and pay special attention to the manual as it refers to
> >> the third parameter :) The expression you need follows:
> >>    "#^([^\\s]*)\\s(.*)$#U"
> 
> This works perfectly.
> 
> I now see that the preg_match() function returns an array with the 
> original text, the selected text, and the discarded text. That wasn't 
> clear to me before when I wasn't looking for that kind of behavior. But 
> now that you point it out I see how it works.
> 
> But I am still confused about the expression you used. I can't quite 
> break it down:
> The opening "^" says to start at the beginning of the line.
> The brackets indicate a sub-expression.
> The square brackets indicate a character class (?).
> The "^" inside the square brackets means "not".
> 
> First question, why is there an extra backslash before the space marker 
> "\s"? Isn't that an escape character, so wouldn't that turn the 
> following space marker into a literal backslash followed by an "s"?
> 
> The "*" says to select everything matching the preceeding conditions.
> 
> There's that double backslash and "s" again.
> 
> Hmm... does the (.*) after the second "\s" mean to match all the 
> whitespace found? For example if there happened to be two space 
> characters instead of just one?
> 
> The PHP manual says the "$" means to "assert end of subject". Which I 
> think means "stop looking for any more matches".
> 
> So basically I'm confused about the extra escape slashes.

The extra slashes are to properly escape the backslash character since
you are using double quotes. While it is true that your expression works
because \s has no meaning in PHP, you can't rely on that being
indefinitely true. You are relying on a side effect of an unrecognized
special character. In all honesty, I should have escaped the $ character
also since it denotes a variable in double quotes. In case you didn't
know, double quotes indicate interpolated strings, single quotes
indicate literal strings. This is why you can produce newline, tab, and
other special characters in double quotes, but not in single quotes. The
best way to write your string since you do not intend to perform any
interpolation is the following:

    '#^([^\s]*)\s(.*)$#'

As for the meaning of the individual parts of the pattern...

The first ^ anchors the matching to the beginning of the string to
match.

The open square braces indicates a character range (which can include
character classes such as \s for whitespace. The ^ within the range
negates the semantics of the range, so as you say... do NOT match the
enclosed range. The * following the square brackets say that there can
be 0 to infinity characters matched (this means if your value being
matched has a lead space then you can get a blank first match... you may
actually want + here instead of *, but I think you're trimming anyways
and I followed your original). the following \s then matches the first
whitespace following the initial block. You might actually want to put
a ? after this so that if the end of the value is reached without any
whitespace then you still match the initial portion. The .* portion says
match 0 to infinity characters of any value and the trailing $ anchors
the matching to the end of the value thus forcing .* to match all
characters from the space to the end of the string.

As I said you may want the following pattern in cases where no space
exists:

    '#^([^\s]*)\s?(.*)$#'

In this usage the ? makes it match only if it exists. it's the same as
\s{0,1} but obviously shorter :)

Cheers,
Rob.
-- 
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting  |
| a powerful, scalable system for accessing system services  |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for       |
| creating re-usable components quickly and easily.          |
`------------------------------------------------------------'

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux