RE: Adding text before last paragraph

"Brian Rue" <brianrue@xxxxxxxxx> · Mon, 27 Aug 2007 17:15:13 -0700

Sure, I'll break it apart a little:

'{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is'

$regex = '{' .     // opening delimeter
         '(?=' .   // positive lookahead: match the beginning of a position 
                   // that matches the following pattern:
             '<p' .  // first part of an opening <p> tag
		 '(?:' . // non-capturing parenthesis (same as normal 
			 // parenthesis, but a bit faster since we don't 
			 // need to capture what they match for use later
		 '>|\s' . // match a closing > or a space
		 ')' . // end capturing paranthesis
		 '(?!' . // negative lookahead: the match will fail if the
//following pattern matches from the current position
		 '.*' .  // match until the end of the string
		 '<p(?:>|\s)' . // same as above - look for another <p> tag
		 ')' .  // end negative lookahead
         ')' .      // end positive lookahead
         '}is';	  // ending delimeter, and use modifiers s and i

About the modifiers: i makes it case-insensitive, and s turns on
dot-matches-all-mode (including newlines)--otherwise, the . would only match
until the next newline.

The regex has two parts: matching a tag, and then making sure there
aren't any more tags in the string following it. The positive lookahead
is (hopefully) pretty straightforward. The negative lookahead works by using
a greedy (regular) .*, which forces the regex engine to match all the way to
the end of the haystack. Then it encounters the <p(?:>\s) part, forcing it
to backtrack until it finds a tag. If it doesn't find one before
returning to the 'current' position (directly after the tag we just
matched), then we know we have found the last tag.

The positive and negative lookahead are 'zero-width' requirements, which
means they don't advance the regex engine's pointer in the haystack string.
Since the entire regex is zero-width, the replacement string gets inserted
at the matched position. 

I hope that made at least a little bit of sense :) If you're doing a lot of
regex work, I would strongly recommend reading the book Mastering Regular
Expressions by Jeffrey Friedl... it's very well written and very helpful.

-Brian

-----Original Message-----
From: Dotan Cohen [mailto:dotancohen@xxxxxxxxx] 
Sent: Monday, August 27, 2007 3:45 PM
To: Brian Rue
Cc: php-general@xxxxxxxxxxxxx
Subject: Re:  Adding text before last paragraph

On 27/08/07, Brian Rue <brianrue@xxxxxxxxx> wrote:
> Dotan, try this:
>
> $text="<p>First paragraph</p>\n<p>More text</p>\n<p>Some more
> text</p>\n<p>End of story</p>";
>
> $story = preg_replace('{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is', "<p>new
> paragraph goes here</p>\n", $text);
>
> This matches a position that has an opening <p> tag (with or without
> parameters), which is NOT followed anywhere in $text by another opening
<p>
> tag. The replacement string will be inserted at the matched position,
which
> will be directly before the last <p> tag. Not sure if this is the most
> efficient regex, but it should get the job done. Let me know how it
goes...
> I'd also be interested to hear any comments on that regex's efficiency.
>
> -Brian Rue
>

Thank you Brian. This most certainly works. I'm having a very hard
time decyphering your regex, as I'd like to learn from it. I'm going
over PCRE again, but I think that I may hit google soon. Thank you
very, very much for the working code. As usual, I have another night
of regex waiting for me...

Dotan Cohen

http://lyricslist.com/
http://what-is-what.com/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php