On 18/10/06, Ivo F.A.C. Fokkema <I.F.A.C.Fokkema@xxxxxxx> wrote:
On Tue, 17 Oct 2006 17:26:42 +0100, Robin Vickery wrote: > On 17/10/06, Al <news@xxxxxxxxxxxxx> wrote: >> AYSERVE.NET wrote: >> > Please, I need help on how to to recognise url in a block of text being >> > retrieved from a database and present it as a link within that text. >> > >> > I will appreciate any help. >> > Regards, >> > Bunmi >> Show us some examples of URL substrings, with any variations, you want to handle. >> >> Most likely a regex function will do the job. > > In 6 easy steps: > > Step 1: Pinch a regexp from perl... > > perl -e 'use Regexp::Common; print $RE{URI}{HTTP}, "\n";' > > Step 2: Double up all backslashes > > M-x replace-string \ \\ > > Step 3: Escape single quote-marks > > M-x replace-string ' \' > > Step 4. modify slightly to cope with the https scheme by adding an > optional 's' to the http scheme. > > Step 5. add angle-brackets as delimiters > > Step 6. use in a preg_replace() > > <?php > > $textString = 'orem ipsum dolor sit amet, consectetuer adipiscing > elit. Proin et urna. Duis quam. Suspendisse potenti. Etiam sem tortor, > ultricies nec, http://example.com imperdiet nec, tempus ac, purus. > Suspendisse id lectus. Nam vitae quam. Aliquam ligula nisl, vestibulum > vulputate, tempor nec, https://www.example.com tincidunt sit amet, > libero. Suspendisse a justo. Cum sociis natoque penatibus et.'; > > $url_regexp = '<(?:(?:https?)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\\-_.!~*\'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\\-_.!~*\'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)>'; > > $output = preg_replace($url_regexp, '<a href="$0">$0</a>', $textString); > > print $output; > ?> > > If http and https isn't enough for you, there's another more general > regexp but... well, it's 8.5Kb long. Holy ****!!! I've used regexps for quite a while now, but won't even begin to read that. I use: /^(ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??.*$/i to match an full URL with domain name or IP address, and: /((ht|f)tps?:\/\/([0-9]{1,3}(\.[0-9]{1,3}){3}|([0-9a-z][-0-9a-z]*[0-9a-z]\.)+[a-z]{2,4})\/?[%&=#0-9a-z\/._+-]*\??[^[:space:]]+)/i to replace an space delimited URL with preg_replace. It has worked fine for me, but I just can't read your regexp, so I can't see why it's better.
Depends what you want it for - the Regexp::Common expression is precise - it matches a well-formed http or https URI, nothing more or less. It's also used in thousands of perl applications through Regexp::Common and has been extremely thoroughly tested. Yours is an approximation - it probably works fine for this kind of job, but it will make mistakes at times. For instance - if the OP is trying to display html source with the links live: $text='<a href="http://www.example.com">this is a link</a>'; The Regexp::Common expression just matches the URI, whereas yours matches all the way through to the end of the word "this". -robin -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php