RE: Adding text before last paragraph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sure, I'll break it apart a little:

'{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is'

$regex = '{' .     // opening delimeter
         '(?=' .   // positive lookahead: match the beginning of a position 
                   // that matches the following pattern:
             '<p' .  // first part of an opening <p> tag
		 '(?:' . // non-capturing parenthesis (same as normal 
			 // parenthesis, but a bit faster since we don't 
			 // need to capture what they match for use later
		 '>|\s' . // match a closing > or a space
		 ')' . // end capturing paranthesis
		 '(?!' . // negative lookahead: the match will fail if the
//following pattern matches from the current position
		 '.*' .  // match until the end of the string
		 '<p(?:>|\s)' . // same as above - look for another <p> tag
		 ')' .  // end negative lookahead
         ')' .      // end positive lookahead
         '}is';	  // ending delimeter, and use modifiers s and i

About the modifiers: i makes it case-insensitive, and s turns on
dot-matches-all-mode (including newlines)--otherwise, the . would only match
until the next newline.

The regex has two parts: matching a <p> tag, and then making sure there
aren't any more <p> tags in the string following it. The positive lookahead
is (hopefully) pretty straightforward. The negative lookahead works by using
a greedy (regular) .*, which forces the regex engine to match all the way to
the end of the haystack. Then it encounters the <p(?:>\s) part, forcing it
to backtrack until it finds a <p> tag. If it doesn't find one before
returning to the 'current' position (directly after the <p> tag we just
matched), then we know we have found the last <p> tag.

The positive and negative lookahead are 'zero-width' requirements, which
means they don't advance the regex engine's pointer in the haystack string.
Since the entire regex is zero-width, the replacement string gets inserted
at the matched position. 

I hope that made at least a little bit of sense :) If you're doing a lot of
regex work, I would strongly recommend reading the book Mastering Regular
Expressions by Jeffrey Friedl... it's very well written and very helpful.

-Brian

-----Original Message-----
From: Dotan Cohen [mailto:dotancohen@xxxxxxxxx] 
Sent: Monday, August 27, 2007 3:45 PM
To: Brian Rue
Cc: php-general@xxxxxxxxxxxxx
Subject: Re:  Adding text before last paragraph

On 27/08/07, Brian Rue <brianrue@xxxxxxxxx> wrote:
> Dotan, try this:
>
> $text="<p>First paragraph</p>\n<p>More text</p>\n<p>Some more
> text</p>\n<p>End of story</p>";
>
> $story = preg_replace('{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is', "<p>new
> paragraph goes here</p>\n", $text);
>
> This matches a position that has an opening <p> tag (with or without
> parameters), which is NOT followed anywhere in $text by another opening
<p>
> tag. The replacement string will be inserted at the matched position,
which
> will be directly before the last <p> tag. Not sure if this is the most
> efficient regex, but it should get the job done. Let me know how it
goes...
> I'd also be interested to hear any comments on that regex's efficiency.
>
> -Brian Rue
>

Thank you Brian. This most certainly works. I'm having a very hard
time decyphering your regex, as I'd like to learn from it. I'm going
over PCRE again, but I think that I may hit google soon. Thank you
very, very much for the working code. As usual, I have another night
of regex waiting for me...

Dotan Cohen

http://lyricslist.com/
http://what-is-what.com/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux