Re: regular expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 11/18/2006 05:46 AM, Børge Holen wrote:
["desc"] = " <c> FFFFFF topic <c> 999999 rest of the text ",

$string = preg_replace("/<c>\s\w[0-9A-F]+/","",$string);

prints out: topic rest of the text ( with double spaces :(, I thought
\s would fix that )

however how would I go on this:

<font color="colorcode">topic</font>
<font color="colorcode">rest of thetext</font>


Børge,

Here's how I would think this one through:

First, I'm having to make several guesses at the nature of your text content:

- You use the single word "topic" but I'll assume this can be multiple words and spaces.

- Your source string includes a space after "rest of the text " while your marked-up result doesn't. However I will assume that you really do mean the rest of the text until end-of-string.

- Your source string also includes a space before the initial <c> but your regexp pattern doesn't. I'll assume that both beginning and ending spaces are unintentional.


Your source string:

        "<c> FFFFFF topic <c> 999999 rest of the text"

consists of these parts:

1) [start-of-string]
2) "<c> "
3) "FFFFFF"     (color code 1)
4) " "
5) "topic"      (text 1)
6) " <c> "
7) "999999"     (color code 2)
8) " "
9) "rest of the text"   (text 2)
10) [end-of-string]

i.e.:

1) [start-of-string]
2) <c> + whitespace
3) color code 1
4) whitespace
5) one or more characters
6) whitespace + <c> + whitespace
7) color code 2
8) whitespace
9) one or more characters
10) [end-of-string]

This suggests the regexp pattern:

1) ^
2) <c>\s
3) ([0-9A-F]{6})
4) \s
5) (.+)
6) \s<c>\s
7) ([0-9A-F]{6})
8) \s
9) (.+)
10) $

/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i

Everything in the source string that you need to retain needs to be in parentheses so regexp can grab it.

In 5) I can let the pattern be greedy, safe in the knowledge that there WILL be a /s<c> to terminate the character-grab.

I end with the pattern modifier /i so it will work with lowercase letters in the RGB color codes.

PHP:

$sText = '<c> FFFFFF topic <c> 999999 rest of the text';
$sPattern = '/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i';
preg_match($sPattern, $sText, $aMatches);
print_r($aMatches);

result:

Array
(
    [0] => <c> FFFFFF topic <c> 999999 rest of the text
    [1] => FFFFFF
    [2] => topic
    [3] => 999999
    [4] => rest of the text
)

This isolates the four substrings you want in regexp references $1 through $4.

Replacement:

[Tangentially, I'd like to comment that font tags are passe. I urge you to use spans with styling instead. I normally dislike using inline styles (style details mixed with the HTML), but in this case (as far as I know) you don't have any choice. If you can, I suggest you replace the literal color codes with style names and define the precise colors in your stylesheet, not your database.

[What this further suggests is that you ought to have two discrete database fields, `topic` and `description`, if you can, rather than combining them into one field that needs to be parsed. Then you can output something like:

        <span class="topic">TOPIC</span> <span class="desc">DESCRIPTION</span>

and leave the RGB color codes out of this layer of your application altogether.]


However, working with the data you've been dealt:

$sTagBegin = '<span style="color:#';
$sTagEnd = ';">';
$sCloseTag = '</span>';

$sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag .
                $sTagBegin . '$3' . $sTagEnd . '$4' . $sCloseTag;

echo preg_replace($sPattern, $sReplacement, $sText);

result:

<span style="color:#FFFFFF;">topic</span> <span style="color:#999999;">rest of the text</span>

____________________________

It's tempting to write the pattern more succinctly to take advantage of the repeating pattern of the source text:

        <c> COLORCODE text

The regexp pattern might be:

1) \s*
2) <c>\s
3) ([0-9A-F]{6})
4) \s
5) ([^<]+)

1) optional whitespace
2) <c> + whitespace
3) color code
4) whitespace
5) one or more characters until the next <

$sText = '<c> FFFFFF topic <c> 999999 rest of the text';

$sPattern = '/\s*<c>\s([0-9A-F]{6})\s([^<]+)/i';

preg_match_all($sPattern, $sText, $aMatches);

result:

Array
(
    [0] => Array
        (
            [0] =>  FFFFFF topic
            [1] =>  999999 rest of the text
        )

    [1] => Array
        (
            [0] => FFFFFF
            [1] => 999999
        )

    [2] => Array
        (
            [0] => topic
            [1] => rest of the text
        )

)

In this case, we need to specify the tag pattern only once:

$sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag;

echo preg_replace($sPattern, $sReplacement, $sText);

result:

<span style="color:#FF0000;">topic </span> <span style="color:#00FF00;">rest of the text</span>

Notice is that this results in whitespace after the topic string. Someone more knowledgeable in regular expressions can probably tell you how to eliminate that, perhaps by using a regexp assertion:
http://php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference.assertions

Regards,
Paul
__________________________

Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux