Regex question: replacing incidences of character when not enclosed within HTML tags?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I have content that contains several lengthy hyphenated sentences, such as:

This-is-a-sentence-in-which-all-the-words-are-hyphenated.

I've noticed that some (maybe all?) browsers, particularly Firefox, will not
wrap long strings of hyphenated words when they are contained in a DIV tag
-- instead, the sentence simply runs out across the right border of the DIV
and overlaps any content to the right.

After some experimentation, I discovered that a hack that seems to behave
well in Firefox, Opera, and IE6 is to replace all incidences of the hyphen
("-") character with "-<span style='font-size:0px;'> </span>". This
effectively creates a zero-width space character after each hyphen and
allows the sentence to wrap as desired.

My dilemma is that while I want to replace any hyphens with the above hack
when and where they appear within normal sentences, I don't want to replace
them when and where they appear within HTML tags.

So, imagine I have a string that contains:

<div style='text-decoration: none;
font-size:11px;'>This-is-a-sentence-that-contains-hyphenation. This is a
sentence that doesn't contain hyphenation. This is a <a
href='http://www.nowhere.com/-4484784/index.html'>link</a> that will take
you <span style='font-weight: bold;'>nowhere</span>.</div>

I managed to build a replacement regular expression that would ignore any
hyphenated strings that were terminated by a colon (":") character (which
effectively leaves inline style attributes in DIV and SPAN blocks
untouched).

<?
$thingy = preg_replace("/(-)(?![\w-]+?:)/i", "\$1<span
style='font-size:0px;'> </span>", $whatever); ?>

This seemed to work well in most circumstances, however, in testing I
discovered several incidences of <a href=> urls that also contain hyphens
(in particular, some that lead to Amazon.com).

So, thinking about it a little more, I decided what I was looking for was a
regular expression that would allow me to replace any incidences of hyphens
when not contained within tags (i.e., when not contained between "<" and
">"). 

And this is where things have ground to a halt.

I'm wondering if anyone can give me some help with this? I've tried to find
a solution via google, but most regex examples dealing with HTML tags are
concerned with finding tags and their contents, not with finding specific
instances of characters not contained within tags.

Any help appreciated!

Much warmth,

Murray

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux