> So, what does (.*?) mean? Well, simply said "any character, occuring 0 or more times" occuring 0 or 1 times. I don't think so. ((.*)?) would mean that, but in (.*?), the '?' means "make the preceding pattern non-greedy; that is, make it match the minimum number of times. And as the minimum number of matches of (.*) is zero, it ends up meaning 'match no character at all. So it will always be true, wherever it occurs in a match string. For instance, $ php -a Interactive shell php > $test = "aabbcc"; php > $re = '/.+?(bb?).*/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => bb ) Note here that the initial pattern piece '.+?' is limited to the minimum match. The minimum match is a single character, but that is overruled by the attempt to match capturing sub-expression '(bb?)' so it in fact matches 'aa'. Note that in this regexp, (bb?) means "...a 'b' char followed by zero or one 'b' chars." Now change that initial sub-expression. php > $re = '/.*?(bb?).*/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => bb ) The minimum match here is no characters, constrained to the minimum. But again, the minimum match must be extended to accommodate '(bb?)'. Now remove the minimising constraint. php > $re = '/.*(bb?).*/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => b ) Only one 'b'! Which 'b' is matched? It's the second 'b'. The minimum match for (bb?) is a single 'b' followed by zero 'b's; so the second 'b' satisfies the capturing expression, and the now-greedy initial subexpression can gobble up all of the character to that second 'b'. Don't believe me? php > $re = '/(.*)(bb?).*/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => aab [2] => b ) Let's back up. php > $re = '/(.*?)(bb?).*/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => aa [2] => bb ) As before, with a non-greedy initial sub-expression, except that we now capture that initial sub-expression. (bb?) means "...a 'b' followed by zero or one 'b's, greedily. Can we force that to be non-greedy? php > $re = '/(.*?)(bb??)(.*)/'; php > preg_match($re, $test, $match); php > print_r($match); Array ( [0] => aabbcc [1] => aa [2] => b [3] => bcc ) Yes we can, by appending a moderating '?' which curbs the appetite of the capturing sub-expression: (bb??) Peter West "...and behold, something greater than Jonah is here." > On 23 Feb 2015, at 10:48 pm, Maciek Sokolewicz <maciek.sokolewicz@xxxxxxxxx> wrote: > > Secondly, the above two regexp rules are slightly bloated. What they actually mean is: > ( = start new catchable pattern > . = any character > * = 0 or more of the previous pattern > ? = 0 or 1 of the previous pattern > ) = end catchable pattern > \s = any whitespace character > > So, what does (.*?) mean? Well, simply said "any character, occuring 0 or more times" occuring 0 or 1 times. But since the any character pattern already occurs 0 or more times, the pattern as a whole will either be matched (1 time) or not (0 times). Making the ? metacharacter useless. Now if it were (.+?) then it would state "one or more of any character, with the entire pattern optional. > So in practice, the following patterns are equal in what they represent: (.*?), (.*), (.+?) > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php