On 28/08/07, Brian Rue <brianrue@xxxxxxxxx> wrote: > Sure, I'll break it apart a little: Er, wow, thanks. Lots of material here... > '{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is' > > $regex = '{' . // opening delimeter > '(?=' . // positive lookahead: match the beginning of a position > // that matches the following pattern: > '<p' . // first part of an opening <p> tag > '(?:' . // non-capturing parenthesis (same as normal > // parenthesis, but a bit faster since we don't > // need to capture what they match for use later > '>|\s' . // match a closing > or a space > ')' . // end capturing paranthesis > '(?!' . // negative lookahead: the match will fail if the > //following pattern matches from the current position > '.*' . // match until the end of the string > '<p(?:>|\s)' . // same as above - look for another <p> tag > ')' . // end negative lookahead > ')' . // end positive lookahead > '}is'; // ending delimeter, and use modifiers s and i It was the negative lookahead that confused me, I see. The rest seems pretty straightforward. Difficult, but straightforward. > > About the modifiers: i makes it case-insensitive, and s turns on > dot-matches-all-mode (including newlines)--otherwise, the . would only match > until the next newline. Yes, this I know. > The regex has two parts: matching a <p> tag, and then making sure there > aren't any more <p> tags in the string following it. The positive lookahead > is (hopefully) pretty straightforward. The negative lookahead works by using > a greedy (regular) .*, which forces the regex engine to match all the way to > the end of the haystack. Then it encounters the <p(?:>\s) part, forcing it > to backtrack until it finds a <p> tag. If it doesn't find one before > returning to the 'current' position (directly after the <p> tag we just > matched), then we know we have found the last <p> tag. Nice. Very nice. > The positive and negative lookahead are 'zero-width' requirements, which > means they don't advance the regex engine's pointer in the haystack string. > Since the entire regex is zero-width, the replacement string gets inserted > at the matched position. Hmm. > I hope that made at least a little bit of sense :) If you're doing a lot of > regex work, I would strongly recommend reading the book Mastering Regular > Expressions by Jeffrey Friedl... it's very well written and very helpful. I don't do a lot, but it's a great tool to know when one needs it! Thank you for the patient explanations. Just a general note, both these addresses are 404 right now: http://il.php.net/manual/en/pcre.pattern.modifiers.php http://uk.php.net/manual/en/pcre.pattern.syntax.php Dotan Cohen http://lyricslist.com/ http://what-is-what.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php