Philip Thompson wrote: > On Sep 21, 2009, at 6:20 PM, Jim Lucas wrote: > >> Jim Lucas wrote: >>> Jônatas Zechim wrote: >>>> Hi there, i've the following strings: >>>> >>>> $string1 = 'Lorem ipsum dolor http://site.com sit amet'; >>>> $string2 = 'Lorem ipsum dolor http://www.site.com/ sit amet'; >>>> $string3 = 'Lorem ipsum dolor http://www.site.net sit amet'; >>>> >>>> How can I extract the URL from these strings? >>>> They can be [http:// + url] or [www. + url]. >>>> >>>> Zechim >>>> >>>> >>> >>> Something like this should work for you. >>> >>> <plaintext><?php >>> >>> $urls[] = 'Lorem ipsum dolor http://site.com sit amet'; >>> $urls[] = 'Lorem ipsum dolor https://www.site.com/ sit amet'; >>> $urls[] = 'Lorem ipsum dolor www.site1.net sit amet'; >>> $urls[] = 'Lorem ipsum dolor www site2.net sit amet'; >>> >>> foreach ( $urls AS $url ) { >>> if ( preg_match('%((https?://|www\.)[^\s]+)%', $url, $m) ) { >>> print_r($m); >>> } >>> } >>> >>> ?> >>> >> >> Actually, try this. It seems to work a little better. >> >> <plaintext><?php >> >> $urls[] = 'Lorem ipsum dolor http://site.com sit amet'; >> $urls[] = 'Lorem ipsum dolor https://www.site.com/ or >> http://www.site2.com/'; >> $urls[] = 'Lorem ipsum dolor www.site1.net sit amet'; >> $urls[] = 'Lorem ipsum dolor www site2.net sit amet'; >> >> foreach ( $urls AS $url ) { >> if ( preg_match_all( '%(https?://[^\s]+|www\.[^\s]+)%', >> $url, >> $m, >> (PREG_SET_ORDER ^ PREG_OFFSET_CAPTURE) >> ) ) { >> print_r($m); >> } >> } >> >> ?> > > What if the sub domain was not 'www'? > > http://no-www.org/ > Well, if it had the http:// at the beginning, then it would be found. but, somedomain.no-www.org would not work. But, if they only had no-www.org, it would only find www.org So, I guess it would need to be looking at the characters before the www\. part to include them in the url also This should work. Note: the [^\s]+ placed before the www\. portion. if ( preg_match_all( '%(https?://[^\s]+|[^\/\s]+www\.[^\s]+)%', This should catch example.www.org and no-www.org now. You could get into the business of trying to match the TLD, but that would be a PITA to keep updated. > Cheers, > ~Philip > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php