Re: regex pattern for extracting URLs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 23, 2009 at 1:54 PM, Israel Ekpo <israelekpo@xxxxxxxxx> wrote:
>
>
> On Fri, Oct 23, 2009 at 1:48 PM, Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx>
> wrote:
>>
>> On Fri, 2009-10-23 at 13:45 -0400, Brad Fuller wrote:
>>
>> > On Fri, Oct 23, 2009 at 1:28 PM, Ashley Sheridan
>> > <ash@xxxxxxxxxxxxxxxxxxxx>wrote:
>> >
>> > >  On Fri, 2009-10-23 at 13:23 -0400, Brad Fuller wrote:
>> > >
>> > > I'm looking for a regular expression to accomplish a specific task.
>> > >
>> > > I'm hoping someone who's really good at regex patterns can lend a
>> > > quick hand.
>> > >
>> > > I need a regex pattern that will grab URLs out of HTML that have a
>> > > certain link text. (i.e. the word "Continue")
>> > >
>> > > This is what I have so far but it does not work properly (If there are
>> > > other attributes in the <a> tag it returns them as part of the URL.)
>> > >
>> > >
>> > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i',
>> > > $html, $matches);
>> > >
>> > > It needs to be able to extract the URL and disregard arbitrary
>> > > attributes in the HTML tag
>> > >
>> > > Test it with the following examples:
>> > >
>> > > <a href=/path/to/url.html>Continue</a>
>> > > <a href='/path/to/url.html'>Continue</a>
>> > > <a href="http://example.com/path/to/url.html";
>> > > class="link">Continue</a>
>> > > <a style="font-size: 12px" href="http://example.com/path/to/url.html";
>> > > onlick="someFunction('foo','bar')">Continue</a>
>> > >
>> > > Please reply
>> > >
>> > > Your help is much appreciated.
>> > >
>> > > Thanks in advance,
>> > > Brad F.
>> > >
>> > >
>> > >
>> > >
>> > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*[\"\']+([^\"\']+?).+?>Continue</a>#i',
>> > > $html, $matches);
>> > >
>> > > I just changed your regex a bit. What your regex was previously doing
>> > > was
>> > > matching everything from the first quote after the href= right up
>> > > until the
>> > > first > it found, which would usually be the one that closes the
>> > > opening
>> > > tag. You could make it a bit more intelligent if you wished with
>> > > backreferencing to make sure it matches against the same type of
>> > > quotation
>> > > character it matched as the start of the href's value.
>> > >
>> > >   Thanks,
>> > > Ash
>> > > http://www.ashleysheridan.co.uk
>> > >
>> > >
>> > >
>> >
>> > I appreciate the help.  However, when try this I only get the first
>> > character of the URL.  Can you double check it please.
>> >
>> > Thanks again
>>
>>
>> I think it's probably the first ? in ([^\"\']+?)
>>
>> Remove that and it should do the trick
>>
>> Thanks,
>> Ash
>> http://www.ashleysheridan.co.uk
>>
>>
>
> Hi Brad,
>
> I agree with Jim.
>
> Take a look at this. It might help.
>
> <?php
>
> $xml_string = <<<TEXT_BOUNDARY
> <html>
>     <head>
>         <title></title>
>     </head>
>     <body>
>         <div>
>             <a href="http://example.com/path/to/urlA.html";>Continue</a>
>             <a href="http://example.com/path/to/url2.html";>Brad Fuller</a>
>             <a href="http://example.com/path/to/urlB.html";>Continue</a>
>             <a href="http://example.com/path/to/url4.html";>PHP.net</a>
>             <a href="http://example.com/path/to/urlC.html";
> class="link">Continue</a>
>             <a style="font-size: 12px"
> href="http://example.com/path/to/urlD.html";
> onclick="someFunction('foo','bar')">Continue</a>
>         </div>
>     </body>
> </html>
> TEXT_BOUNDARY;
>
> $xml = simplexml_load_string($xml_string);
>
> $continue_hrefs = $xml->xpath("//a[text() = 'Continue']/@href");
>
> print_r($continue_hrefs);
>
> ?>
>

Thanks, I'm sure I will use this at some point in the future :)

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux