Re: regular expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks to you to for all the information.

I'm taking christmas leave soon to decode whatever exists in the cryptic 
understanding of regexp.  So this walkthrought on how when why what is 
greatly appreciated. =D.
I also got a mail from Oliver Block with an another approach on this specific 
subject, and the comparison between them tells me, somehow to get good in one 
way of thinking and doing regexp, but understand them all.
The operations he gave me was quite simple, and what I need to learn is what 
to use when and why these freakin characters doesn't fit all together and 
play nice. modifier error my ass. [\|/]



On Monday 20 November 2006 10:47, Paul Novitski wrote:
> At 11/18/2006 05:46 AM, Børge Holen wrote:
> >["desc"] = " <c> FFFFFF topic <c> 999999 rest of the text ",
> >
> >$string = preg_replace("/<c>\s\w[0-9A-F]+/","",$string);
> >
> >prints out:     topic  rest of the text     (
> >with double spaces :(, I thought
> >\s would fix that )
> >
> >however how would I go on this:
> >
> ><font color="colorcode">topic</font>
> ><font color="colorcode">rest of thetext</font>
>
> Børge,
>
> Here's how I would think this one through:
>
> First, I'm having to make several guesses at the nature of your text
> content:
>
> - You use the single word "topic" but I'll assume
> this can be multiple words and spaces.
>
> - Your source string includes a space after "rest
> of the text " while your marked-up result
> doesn't.  However I will assume that you really
> do mean the rest of the text until end-of-string.
>
> - Your source string also includes a space before
> the initial <c> but your regexp pattern
> doesn't.  I'll assume that both beginning and ending spaces are
> unintentional.
>
>
> Your source string:
>
>          "<c> FFFFFF topic <c> 999999 rest of the text"
>
> consists of these parts:
>
> 1) [start-of-string]
> 2) "<c> "
> 3) "FFFFFF"     (color code 1)
> 4) " "
> 5) "topic"      (text 1)
> 6) " <c> "
> 7) "999999"     (color code 2)
> 8) " "
> 9) "rest of the text"   (text 2)
> 10) [end-of-string]
>
> i.e.:
>
> 1) [start-of-string]
> 2) <c> + whitespace
> 3) color code 1
> 4) whitespace
> 5) one or more characters
> 6) whitespace + <c> + whitespace
> 7) color code 2
> 8) whitespace
> 9) one or more characters
> 10) [end-of-string]
>
> This suggests the regexp pattern:
>
> 1) ^
> 2) <c>\s
> 3) ([0-9A-F]{6})
> 4) \s
> 5) (.+)
> 6) \s<c>\s
> 7) ([0-9A-F]{6})
> 8) \s
> 9) (.+)
> 10) $
>
> /^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i
>
> Everything in the source string that you need to
> retain needs to be in parentheses so regexp can grab it.
>
> In 5) I can let the pattern be greedy, safe in
> the knowledge that there WILL be a /s<c> to terminate the character-grab.
>
> I end with the pattern modifier /i so it will
> work with lowercase letters in the RGB color codes.
>
> PHP:
>
> $sText = '<c> FFFFFF topic <c> 999999 rest of the text';
> $sPattern = '/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i';
> preg_match($sPattern, $sText, $aMatches);
> print_r($aMatches);
>
> result:
>
> Array
> (
>      [0] => <c> FFFFFF topic <c> 999999 rest of the text
>      [1] => FFFFFF
>      [2] => topic
>      [3] => 999999
>      [4] => rest of the text
> )
>
> This isolates the four substrings you want in regexp references $1 through
> $4.
>
> Replacement:
>
> [Tangentially, I'd like to comment that font tags
> are passe.  I urge you to use spans with styling
> instead.  I normally dislike using inline styles
> (style details mixed with the HTML), but in this
> case (as far as I know) you don't have any
> choice.  If you can, I suggest you replace the
> literal color codes with style names and define
> the precise colors in your stylesheet, not your database.
>
> [What this further suggests is that you ought to
> have two discrete database fields, `topic` and
> `description`, if you can, rather than combining
> them into one field that needs to be
> parsed.  Then you can output something like:
>
>          <span class="topic">TOPIC</span> <span
> class="desc">DESCRIPTION</span>
>
> and leave the RGB color codes out of this layer
> of your application altogether.]
>
>
> However, working with the data you've been dealt:
>
> $sTagBegin = '<span style="color:#';
> $sTagEnd = ';">';
> $sCloseTag = '</span>';
>
> $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag .
>                  $sTagBegin . '$3' . $sTagEnd . '$4' . $sCloseTag;
>
> echo preg_replace($sPattern, $sReplacement, $sText);
>
> result:
>
> <span style="color:#FFFFFF;">topic</span> <span
> style="color:#999999;">rest of the text</span>
>
> ____________________________
>
> It's tempting to write the pattern more
> succinctly to take advantage of the repeating pattern of the source text:
>
>          <c> COLORCODE text
>
> The regexp pattern might be:
>
> 1) \s*
> 2) <c>\s
> 3) ([0-9A-F]{6})
> 4) \s
> 5) ([^<]+)
>
> 1) optional whitespace
> 2) <c> + whitespace
> 3) color code
> 4) whitespace
> 5) one or more characters until the next <
>
> $sText = '<c> FFFFFF topic <c> 999999 rest of the text';
>
> $sPattern = '/\s*<c>\s([0-9A-F]{6})\s([^<]+)/i';
>
> preg_match_all($sPattern, $sText, $aMatches);
>
> result:
>
> Array
> (
>      [0] => Array
>          (
>              [0] =>  FFFFFF topic
>              [1] =>  999999 rest of the text
>          )
>
>      [1] => Array
>          (
>              [0] => FFFFFF
>              [1] => 999999
>          )
>
>      [2] => Array
>          (
>              [0] => topic
>              [1] => rest of the text
>          )
>
> )
>
> In this case, we need to specify the tag pattern only once:
>
> $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag;
>
> echo preg_replace($sPattern, $sReplacement, $sText);
>
> result:
>
> <span style="color:#FF0000;">topic </span> <span
> style="color:#00FF00;">rest of the text</span>
>
> Notice is that this results in whitespace after
> the topic string.  Someone more knowledgeable in
> regular expressions can probably tell you how to
> eliminate that, perhaps by using a regexp assertion:
> http://php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference
>.assertions
>
> Regards,
> Paul
> __________________________
>
> Paul Novitski
> Juniper Webcraft Ltd.
> http://juniperwebcraft.com

-- 
---
Børge
Kennel Arivene 
http://www.arivene.net
---

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux