Re: Re: Filtering URLs problem..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jochem Maas wrote:
Al wrote:

I didn't fully test this; but it should get you started.


fully? more like not at all.

point 1:

"%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i";
            ^-- double quotes are not escaped == parse error

point 2:

"%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i";
^-- this will inject the string 'Array' into the regexp string


point 3:

the regexp does not take into account that HTML tag attributes can
occur in any order e.g:

<a class="mine" id="abc123" target="_top" href="www.bla.com"    >
testing
</a>
point 4:

what happens when the url does not have a protocol specified?
granted the OP did not actually specify if strings like:

    "www.google.com"

should also be considered as a url, so this is not really a valid point.


$types= array('http', 'ftp', 'https', 'mms', 'irc');

$pattern= "%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i"; // the "i" makes it non case sensitive

if(preg_match($pattern, $URL_str, $match)){

    $URL= match[1];
}

else{

    User did not enter a complete link; do the simple thing
}



Anders Norrbring wrote:


I'm writing a filter/parsing function for texts entered by users, and I've run into a problem... What I'm trying to do is to parse URLs of different sorts, ftp, http, mms, irc etc and format them as links, that part was real easy..

The hard part is when a user has already entered a complete link..
In short:

http://www.server.tld/page.html
should be converted to:
<a href='http://www.server.tld/page.html'>http://www.server.tld/page.html</a>

That part works fine, but if the user enters:

<a href='http://www.server.tld/page.html'>click here</a>

it all becomes a mess...  Can somebody please make a suggestion on this?



Jochem's correct. I was in too big a hurry trying to help. It was obvious that Anders was not getting much useful help. His points 3 and 4 are valid and I was not addressing them because they require more work than I have time to devote.

Here is corrected code. It works with the "Regex Coach". I did not try it with a php script.

$types= (http|ftp|https|mms|irc);

$pattern= "%<a\040href\040*=['\"]$types://((www.)*[\w/\.]+)['\"]>.+</a>%i";  // the "i" makes it non case sensitive

if(preg_match($pattern, $URL_str, $match)){

    $URL= match[2];
}

else{

    User did not enter a complete link; do the simple thing
}

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux