On 30/10/2007, Stijn Verholen <stijn@xxxxxxxxxxxxx> wrote: > Hey list, > > I'm having problems with grouped alternative patterns. > The regex I would like to use, is the following: > > /\s*(`?.+`?)\s*int\s*(\(([0-9]+)\))?\s*(unsigned)?\s*(((auto_increment)?\s*(primary\s*key)?)|((not\s*null)?\s*(default\s*(`.*`|[0-9]*)?)?))\s*/i > > It matches this statement: > > `id` INT(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY > > But not this: > > `test4` INT(11) UNSIGNED NOT NULL DEFAULT 5 > > However, if I switch the alternatives, the first statement doesn't > match, but the second does. > FYI: In both cases, the column name and data type are matched, as expected. > It appears to be doing lazy evaluation on the pattern, even though every > resource I can find states that every alternative is tried in turn until > a match is found. It's not lazy. Given alternate matching subpatterns, the pcre engine choses the leftmost pattern, not the longest. For instance: <?php preg_match("/a|ab/", "abbot", $matches); print_r($matches); ?> Array ( [0] => a ) This isn't what you'd expect if you were familiar with POSIX regular expressions, but matches Perl's behaviour. Because each of your subpatterns can match an empty string, the lefthand subpattern always matches and the righthand subpattern might as well not be there. The simplest solution, if you don't want to completely rethink your regexp might be to replace \s with [[:space:]], remove the delimiters and the i modifier and just use eregi(). like so: $pattern = '[[:space:]]*(`?.+`?)[[:space:]]*int[[:space:]]*(\(([0-9]+)\))?[[:space:]]*(unsigned)?[[:space:]]*(((auto_increment)?[[:space:]]*(primary[[:space:]]*key)?)|((not[[:space:]]*null)?[[:space:]]*(default[[:space:]]*(`.*`|[0-9]*)?)?))[[:space:]]*'; eregi($pattern, $column1, $matches); print_r($matches); // match eregi($pattern, $column2, $matches); print_r($matches); // match -robin -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php