Re: Re: How to recognise url in a block of text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/10/06, Ivo F.A.C. Fokkema <I.F.A.C.Fokkema@xxxxxxx> wrote:
On Tue, 17 Oct 2006 17:26:42 +0100, Robin Vickery wrote:

> On 17/10/06, Al <news@xxxxxxxxxxxxx> wrote:
>> AYSERVE.NET wrote:
>> > Please, I need help on how to to recognise url in a block of text being
>> > retrieved from a database and present it as a link within that text.
>> >
>> > I will appreciate any help.
>> > Regards,
>> > Bunmi
>> Show us some examples of URL substrings, with any variations, you want to handle.
>>
>> Most likely a regex function will do the job.
>
> In 6 easy steps:
>
> Step 1: Pinch  a regexp from perl...
>
>   perl -e 'use Regexp::Common; print $RE{URI}{HTTP}, "\n";'
>
> Step 2: Double up all backslashes
>
>   M-x replace-string \ \\
>
> Step 3: Escape single quote-marks
>
>   M-x replace-string ' \'
>
> Step 4. modify slightly to cope with the https scheme by adding an
> optional 's' to the http scheme.
>
> Step 5. add angle-brackets as delimiters
>
> Step 6. use in a preg_replace()
>
> <?php
>
> $textString = 'orem ipsum dolor sit amet, consectetuer adipiscing
> elit. Proin et urna. Duis quam. Suspendisse potenti. Etiam sem tortor,
> ultricies nec,  http://example.com  imperdiet nec, tempus ac, purus.
> Suspendisse id lectus. Nam vitae quam. Aliquam ligula nisl, vestibulum
> vulputate, tempor nec, https://www.example.com  tincidunt sit amet,
> libero. Suspendisse a justo. Cum sociis natoque penatibus et.';
>
> $url_regexp = '<(?:(?:https?)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\\-_.!~*\'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)>';
>
> $output = preg_replace($url_regexp, '<a href="$0">$0</a>', $textString);
>
> print $output;
> ?>
>
> If http and https isn't enough for you, there's another more general
> regexp but... well, it's 8.5Kb long.

Holy ****!!!

I've used regexps for quite a while now, but won't even begin to read
that. I use:

/^(ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??.*$/i

to match an full URL with domain name or IP address, and:

/((ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??[^[:space:]]+)/i

to replace an space delimited URL with preg_replace.

It has worked fine for me, but I just can't read your regexp, so I can't
see why it's better.

Depends what you want it for - the Regexp::Common expression is
precise - it matches a well-formed http or https URI, nothing more or
less. It's also used in thousands of perl applications through
Regexp::Common and has been extremely thoroughly tested.

Yours is an approximation - it probably works fine for this kind of
job, but it will make mistakes at times.

For instance - if the OP is trying to display html source with the links live:

$text='&lt;a href="http://www.example.com"&gt;this is a link&lt;/a&gt;';

The Regexp::Common expression just matches the URI, whereas yours
matches all the way through to the end of the word "this".

-robin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux