> -----Original Message----- > From: MikeP [mailto:mpeloso@xxxxxxxxxxxxx] > Sent: Thursday, December 18, 2008 8:43 AM > To: php-general@xxxxxxxxxxxxx > Subject: Regex Problem > > Hello, > I have a quirky behavior I'm trying to resolve. > I have a REGEX that will find a function definition in a php file: > .....function InsertQuery($table,$fields,$values)..... > the REGEX is: > $regex='/function [a-z]* *([$a-zA-Z]*)/'; > the problem is that: > 1. a slash is automattically put in front of the $. This is good but I > dont > know how it gets there. > 2.a slash is NOT put in front of the parenthesis. Thats bad > 3. If I try to escape the parenthesis with a \ , I get \\. > Help Mike, Certain characters are considered "special" in RegEx. The $ means "end of the line," so it must be escaped to avoid confusing its meaning. I was not sure it had to be escaped within a character set [], but that may very well be the case. Try this: $regex = '/function\s+[-_a-z0-9]+\s*\((\s*\$?[-_a-z0-9]+\s*,?)*\s*\)/i'; The word "function" is followed by 1 or more spaces (or tabs). The function name [-_a-z0-9] can be a combination of alpha-numeric characters, underscore, and dash. Then, there is optional whitespace between the name of the function and its parameters. The opening parenthesis "(" for parameters has been escaped (as has the closing parenthesis). Then, in a repeatable capture group, the parameters can be grabbed: Indefinite whitespace, an optional $ (because maybe you're not using a variable, eh?), one or more alpha-numeric, underscore, or dash characters, followed by indefinite whitespace and an optional comma (if there are more arguments). After any number of instances of the capture group, the regex continues by looking for indefinite whitespace followed by the closing parenthesis for the function text. The "i" switch at the end simply means that this regex pattern will be treated as case-insensitive ('APPLE' == 'apple'). If you're not worried about actually splitting up the function parameters into capture groups, then you can just use a look-ahead to ensure that you grab everything up till the LAST parenthesis on the line. $regex = '/function\s+[-_a-z0-9]+\s*\(.*?\)(?=.*\)[^)]*)/i'; That one probably needs to be tweaked a bit in order to actually grab the last parenthesis (instead of just checking for its existence). If you're willing to trust the text you'll be searching through, you can probably avoid that "last parenthesis" rule altogether, and make a lazy regex: $regex = '/function\s+[-_a-z0-9]+\s*\(.*?/i'; Once you get to the opening parenthesis for the function parameters, that last regex assumes that the rest of the line will also include that function declaration, and just grabs everything left. If you are using a regex setup to where the dot marker can also consume newline or carriage return characters, just throw a "$" at the end of the regex (before the flags part "/i") in order to tell it just to grab characters until it reaches the end of the line: $regex = '/function\s+[-_a-z0-9]+\s*\(.*?$/i'; These are all untested, but hopefully I've given you a nudge in the right direction. If you are still getting strange behavior out of your PCRE engine, then perhaps you have a different version installed than what I'm used to--all of the above should work (perhaps with some very minor changes) in PHP. HTH, // Todd -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php