Re: Re: How to recognise url in a block of text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Oct 2006 17:26:42 +0100, Robin Vickery wrote:

> On 17/10/06, Al <news@xxxxxxxxxxxxx> wrote:
>> AYSERVE.NET wrote:
>> > Please, I need help on how to to recognise url in a block of text being
>> > retrieved from a database and present it as a link within that text.
>> >
>> > I will appreciate any help.
>> > Regards,
>> > Bunmi
>> Show us some examples of URL substrings, with any variations, you want to handle.
>>
>> Most likely a regex function will do the job.
> 
> In 6 easy steps:
> 
> Step 1: Pinch  a regexp from perl...
> 
>   perl -e 'use Regexp::Common; print $RE{URI}{HTTP}, "\n";'
> 
> Step 2: Double up all backslashes
> 
>   M-x replace-string \ \\
> 
> Step 3: Escape single quote-marks
> 
>   M-x replace-string ' \'
> 
> Step 4. modify slightly to cope with the https scheme by adding an
> optional 's' to the http scheme.
> 
> Step 5. add angle-brackets as delimiters
> 
> Step 6. use in a preg_replace()
> 
> <?php
> 
> $textString = 'orem ipsum dolor sit amet, consectetuer adipiscing
> elit. Proin et urna. Duis quam. Suspendisse potenti. Etiam sem tortor,
> ultricies nec,  http://example.com  imperdiet nec, tempus ac, purus.
> Suspendisse id lectus. Nam vitae quam. Aliquam ligula nisl, vestibulum
> vulputate, tempor nec, https://www.example.com  tincidunt sit amet,
> libero. Suspendisse a justo. Cum sociis natoque penatibus et.';
> 
> $url_regexp = '<(?:(?:https?)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\\-_.!~*\'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)>';
> 
> $output = preg_replace($url_regexp, '<a href="$0">$0</a>', $textString);
> 
> print $output;
> ?>
> 
> If http and https isn't enough for you, there's another more general
> regexp but... well, it's 8.5Kb long.

Holy ****!!!

I've used regexps for quite a while now, but won't even begin to read
that. I use:

/^(ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??.*$/i

to match an full URL with domain name or IP address, and:

/((ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??[^[:space:]]+)/i

to replace an space delimited URL with preg_replace.

It has worked fine for me, but I just can't read your regexp, so I can't
see why it's better.

Ivo

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux