RE: Regex Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: MikeP [mailto:mpeloso@xxxxxxxxxxxxx]
> Sent: Thursday, December 18, 2008 8:43 AM
> To: php-general@xxxxxxxxxxxxx
> Subject:  Regex Problem
> 
> Hello,
> I have  a quirky behavior I'm trying to resolve.
> I have a REGEX that will find a function definition in a php file:
> .....function InsertQuery($table,$fields,$values).....
> the REGEX is:
> $regex='/function [a-z]* *([$a-zA-Z]*)/';
> the problem is that:
> 1. a slash is automattically put in front of the $. This is good but I
> dont
> know how it gets there.
> 2.a slash is NOT put in front of the parenthesis. Thats bad
> 3. If I try to escape the parenthesis with a \ , I get \\.
> Help

Mike,

Certain characters are considered "special" in RegEx. The $ means "end
of the line," so it must be escaped to avoid confusing its meaning. I
was not sure it had to be escaped within a character set [], but that
may very well be the case. Try this:

$regex = '/function\s+[-_a-z0-9]+\s*\((\s*\$?[-_a-z0-9]+\s*,?)*\s*\)/i';

The word "function" is followed by 1 or more spaces (or tabs). The
function name [-_a-z0-9] can be a combination of alpha-numeric
characters, underscore, and dash. Then, there is optional whitespace
between the name of the function and its parameters. The opening
parenthesis "(" for parameters has been escaped (as has the closing
parenthesis). Then, in a repeatable capture group, the parameters can be
grabbed: Indefinite whitespace, an optional $ (because maybe you're not
using a variable, eh?), one or more alpha-numeric, underscore, or dash
characters, followed by indefinite whitespace and an optional comma (if
there are more arguments). After any number of instances of the capture
group, the regex continues by looking for indefinite whitespace followed
by the closing parenthesis for the function text. The "i" switch at the
end simply means that this regex pattern will be treated as
case-insensitive ('APPLE' == 'apple').

If you're not worried about actually splitting up the function
parameters into capture groups, then you can just use a look-ahead to
ensure that you grab everything up till the LAST parenthesis on the
line.

$regex = '/function\s+[-_a-z0-9]+\s*\(.*?\)(?=.*\)[^)]*)/i';

That one probably needs to be tweaked a bit in order to actually grab
the last parenthesis (instead of just checking for its existence). If
you're willing to trust the text you'll be searching through, you can
probably avoid that "last parenthesis" rule altogether, and make a lazy
regex:

$regex = '/function\s+[-_a-z0-9]+\s*\(.*?/i';

Once you get to the opening parenthesis for the function parameters,
that last regex assumes that the rest of the line will also include that
function declaration, and just grabs everything left. If you are using a
regex setup to where the dot marker can also consume newline or carriage
return characters, just throw a "$" at the end of the regex (before the
flags part "/i") in order to tell it just to grab characters until it
reaches the end of the line:

$regex = '/function\s+[-_a-z0-9]+\s*\(.*?$/i';

These are all untested, but hopefully I've given you a nudge in the
right direction. If you are still getting strange behavior out of your
PCRE engine, then perhaps you have a different version installed than
what I'm used to--all of the above should work (perhaps with some very
minor changes) in PHP.

HTH,


// Todd

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux