Re: Re: Extract specific div element from page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oops, I accidentally sent this directly to CK, my apologies.

Thank you for your replies. The reason that I didn't explore the JS route is
because this will be running in the background, I didn't want to have to
visit the page in any way. I went looking for an easy way to accomplish this
in PHP but due to malformed HTML in some sites (not wordpress that I am
aware of) it wasn't going to be so easy. Someone in ##php on
irc.freenode.net pointed me to BeautifulSoup which is a Python module for
scraping pages even if they have bad HTML. Within a minute I had a script
that grabbed the parts I wanted, and even removed the parts I didn't (such
as comments). Now I have a Python script that runs when I am going to update
the docs on my Palm, it grabs the page(s), strips out the unimportant stuff,
saves to a local directory, and then I have Sunrise parse that into plucker
document format.

Once again, thank you for the responses.



On 6/15/07, Dan <frozendice@xxxxxxxxx> wrote:

Or you could just use Javascript combined with PHP, just use javascript
it's
something like this document.getElementById('tagId').innerHtml that will
give you the html(contents) of the <div> tag you specify.  Then just do
something like document.form.value =
document.getElementById('tagId').innerHtml.  Basicly you're setting a
hidden
form element to have the value of the div, then when you submit the page,
you have the content as $_POST['formYouSetTo'].  You could have the JS
execute on the submit button's onclick.

It should be relatively easy if you look up the exact syntax of the
javascript.

- Daniel

""Anthony Hiscox"" <distatica@xxxxxxxxxxxxx> wrote in message
news:6dfcba5e0706151440p19d81dccrddda1633339827e5@xxxxxxxxxxxxxxxxx
> Hey folks,
>
> I need to pull the contents inside of a specific div out of a page, and
> write it to a separate file. In this instance I am taking everything
> inside
> of <div id="content"></div> tags from a wordpress blog, this will give
me
> only the content and not the menus, or other stuff. I need to do this
> because the final document will be converted for viewing on a palm
pilot.
>
> Is anyone aware of a simple solution to this problem, short of parsing
the
> entire page and starting when I hit that div opening tag, and stopping
> when
> I hit the closing tag? One problem I can see with this method is that I
> would have to count divs inside of that div, otherwise I would end too
> early
> on.
>
> Any advice would be greatly appreciated.
>
> Peace and Love,
> distatica.
>
> --
> ---------------------------------
> Anthony Hiscox
>
> Video Watch Group
> Public Site Currently Under Development
> Group Members Site Fully Operational
> ---------------------------------
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--
---------------------------------
Anthony Hiscox

Video Watch Group
Public Site Currently Under Development
Group Members Site Fully Operational
---------------------------------

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux