Re: grabbing source of a URL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What about using just the file command and then looping through the array?

I do this to scrape sites for content (pics, midi's, fonts) by getting the
links from within the html code and using the wwwcopy function in the php
docs.

I am sure there is a better way to do the pattern recognition but this works
for me. Perhaps someone can suggest a more streamlined method.

    function getPicInfo($strSiteName, $strPartial)
    {
        if ($strSiteName != "")
        {
            $strURL = "http://".$strSiteName."/".$strPartial."/";;
            $strMatch = "/gallery/";
            $arrBase = file($strURL);
            foreach ($arrBase as $intLine => $strVal)
            {
                $arrTemp = array();
                $strLine = strtolower($strVal);
                array_push($arrTemp, $strLine);
                if (preg_grep($strMatch, $arrTemp))
                {
                    // extract the href and do the copy here.
                }
            }
        }
    }

So this will look for the string "gallery" in the remote HTML file.
If you want to get everything between this and another match you could set a
flag that outputs the lines to an alternate array...

    function getPicInfo($strSiteName, $strPartial)
    {
        $blnOutput = FALSE;
        $arrOutput = array();
        if ($strSiteName != "")
        {
            $strURL = "http://".$strSiteName."/".$strPartial."/";;
            $strMatch = "/gallery/";
            $strMatch = "/completed/";
            $arrBase = file($strURL);
            foreach ($arrBase as $intLine => $strVal)
            {
                $arrTemp = array();
                $strLine = strtolower($strVal);
                array_push($arrTemp, $strLine);
                if (preg_grep($strMatch, $arrTemp))
                {
                    // extract the href and do the copy here.
                    $blnOutPut = TRUE;
                }
                else if (preg_grep($strMatch2, $arrTemp))
                {
                    // extract the href and do the copy here.
                    $blnOutPut = FALSE;
                }
                if ($blnOutput)
                {
                    array_push($arrOutput, $strVal);
                }
            }
        }
    }

It's probably not very nice code, but it will do the job.

Can someone PLEASE help me with my encryption problems?!?!?!

Darren

"Warren Vail" <Warren.Vail@xxxxxxxxxx> wrote in message
news:72138202E59CD6118E960002A52CD9D2178E2CB8@xxxxxxxxxxxxxxxxxxxxxxxxx
> Oops missed part of your question;
>
> > know what function to use to grab the page.  for the string
>
> http://us2.php.net/manual/en/function.fopen.php
>
> There are some good samples on the page
>
>        $dh = fopen("$url",'r');
>        $result = fread($dh,8192);
>
> Hope this is what you need.
>
> Warren Vail
>
>
> > -----Original Message-----
> > From: Adam Williams [mailto:awilliam@xxxxxxxxxxxxxxxx]
> > Sent: Friday, December 10, 2004 9:56 AM
> > To: php-general@xxxxxxxxxxxxx
> > Subject:  grabbing source of a URL
> >
> >
> > Hi, I don't know what functions to use so maybe someone can
> > help me out.
> > I want to grab a URL's source (all the code from a link) and
> > then cut out
> > a block of text from it, throw it away, and then show the page.
> >
> > For example, if I have page.html with 3 lines:
> >
> > <html><head><title>hi</title></head>
> > <body>
> > <!-- line a -->
> > this is line a
> > <!-- end line a -->
> > <!-- line b -->
> > this is line b
> > <!-- end line b -->
> > <!-- line c -->
> > this is line c
> > <!-- end line c -->
> > </body></html>
> >
> > i want my php script to grab the source of page.html, strip out:
> >
> > <!-- line a -->
> > this is line a
> > <!-- end line a -->
> >
> > and then display what is left, how would I go about doing
> > this?  I don't
> > know what function to use to grab the page.  for the string
> > to remove, I
> > know I can probably do a str_replace and replace the known code with
> > nothing.
> >
> > --
> > PHP General Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux