Re: regular expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/11/06, Paul Novitski <paul@xxxxxxxxxxxxxxxxxxx> wrote:>> Børge,>> Here's how I would think this one through:>> First, I'm having to make several guesses at the nature of your text content:>> - You use the single word "topic" but I'll assume> this can be multiple words and spaces.>> - Your source string includes a space after "rest> of the text " while your marked-up result> doesn't.  However I will assume that you really> do mean the rest of the text until end-of-string.>> - Your source string also includes a space before> the initial <c> but your regexp pattern> doesn't.  I'll assume that both beginning and ending spaces are unintentional.>>> Your source string:>>          "<c> FFFFFF topic <c> 999999 rest of the text">> consists of these parts:>> 1) [start-of-string]> 2) "<c> "> 3) "FFFFFF"     (color code 1)> 4) " "> 5) "topic"      (text 1)> 6) " <c> "> 7) "999999"     (color code 2)> 8) " "> 9) "rest of the text"   (text 2)> 10) [end-of-string]>> i.e.:>> 1) [start-of-string]> 2) <c> + whitespace> 3) color code 1> 4) whitespace> 5) one or more characters> 6) whitespace + <c> + whitespace> 7) color code 2> 8) whitespace> 9) one or more characters> 10) [end-of-string]>> This suggests the regexp pattern:>> 1) ^> 2) <c>\s> 3) ([0-9A-F]{6})> 4) \s> 5) (.+)> 6) \s<c>\s> 7) ([0-9A-F]{6})> 8) \s> 9) (.+)> 10) $>> /^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i>> Everything in the source string that you need to> retain needs to be in parentheses so regexp can grab it.>> In 5) I can let the pattern be greedy, safe in> the knowledge that there WILL be a /s<c> to terminate the character-grab.>> I end with the pattern modifier /i so it will> work with lowercase letters in the RGB color codes.>> PHP:>> $sText = '<c> FFFFFF topic <c> 999999 rest of the text';> $sPattern = '/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i';> preg_match($sPattern, $sText, $aMatches);> print_r($aMatches);>> result:>> Array> (>      [0] => <c> FFFFFF topic <c> 999999 rest of the text>      [1] => FFFFFF>      [2] => topic>      [3] => 999999>      [4] => rest of the text> )>> This isolates the four substrings you want in regexp references $1 through $4.>> Replacement:>> [Tangentially, I'd like to comment that font tags> are passe.  I urge you to use spans with styling> instead.  I normally dislike using inline styles> (style details mixed with the HTML), but in this> case (as far as I know) you don't have any> choice.  If you can, I suggest you replace the> literal color codes with style names and define> the precise colors in your stylesheet, not your database.>> [What this further suggests is that you ought to> have two discrete database fields, `topic` and> `description`, if you can, rather than combining> them into one field that needs to be> parsed.  Then you can output something like:>>          <span class="topic">TOPIC</span> <span class="desc">DESCRIPTION</span>>> and leave the RGB color codes out of this layer> of your application altogether.]>>> However, working with the data you've been dealt:>> $sTagBegin = '<span style="color:#';> $sTagEnd = ';">';> $sCloseTag = '</span>';>> $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag .>                  $sTagBegin . '$3' . $sTagEnd . '$4' . $sCloseTag;>> echo preg_replace($sPattern, $sReplacement, $sText);>> result:>> <span style="color:#FFFFFF;">topic</span> <span> style="color:#999999;">rest of the text</span>>> ____________________________>> It's tempting to write the pattern more> succinctly to take advantage of the repeating pattern of the source text:>>          <c> COLORCODE text>> The regexp pattern might be:>> 1) \s*> 2) <c>\s> 3) ([0-9A-F]{6})> 4) \s> 5) ([^<]+)>> 1) optional whitespace> 2) <c> + whitespace> 3) color code> 4) whitespace> 5) one or more characters until the next <>> $sText = '<c> FFFFFF topic <c> 999999 rest of the text';>> $sPattern = '/\s*<c>\s([0-9A-F]{6})\s([^<]+)/i';>> preg_match_all($sPattern, $sText, $aMatches);>> result:>> Array> (>      [0] => Array>          (>              [0] =>  FFFFFF topic>              [1] =>  999999 rest of the text>          )>>      [1] => Array>          (>              [0] => FFFFFF>              [1] => 999999>          )>>      [2] => Array>          (>              [0] => topic>              [1] => rest of the text>          )>> )>> In this case, we need to specify the tag pattern only once:>> $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag;>> echo preg_replace($sPattern, $sReplacement, $sText);>> result:>> <span style="color:#FF0000;">topic </span> <span> style="color:#00FF00;">rest of the text</span>>> Notice is that this results in whitespace after> the topic string.  Someone more knowledgeable in> regular expressions can probably tell you how to> eliminate that, perhaps by using a regexp assertion:> http://php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference.assertions>> Regards,> Paul> __________________________>> Paul Novitski> Juniper Webcraft Ltd.> http://juniperwebcraft.com>
Paul, I just got around to reading this thread. The post of yours thatI quote above has got to be one of the best posts that I've read inthe 5 years that I've been on and off the php list. The way you breakthat regex down taught me things that have eluded me for half adecade. Although I have nothing to do with the OP, I really want tosay thanks for that bit of information.
Dotan Cohen
http://lyricslist.com/http://what-is-what.com/

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux