Re: Using Curl to replicate a site

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ashley Sheridan wrote:
On Thu, 2009-12-10 at 11:10 -0500, Robert Cummings wrote:
Ashley Sheridan wrote:
> Hi,
> > I need to replicate a site on another domain, and in this case, an
> iframe won't really do, as I need to remove some of the graphics, etc
> around the content. The owner of the site I'm needing to copy has asked
> for the site to be duplicated, and unfortunately in this case, because
> of the CMS he's used (which is owned by the hosting he uses) I need a
> way to have the site replicated on an already existing domain as a
> microsite, but in a way that it is always up-to-date.
> > I'm fine using Curl to grab the site, and even alter the content that is
> returned, but I was thinking about a caching mechanism. Has anyone any
> suggestions on this?

Sounds like you're creating a proxy with post processing/caching on the forwarded content. It should be fairly straightforward to direct page requests to your proxy app, then make the remote request, and post-process, cache, then send to the browser. The only gotcha will be for forms if you do caching.

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP


The only forms are processed on another site, so there's nothing I can really do about that, as they return to the original site.

How would I go about doing what you suggested though? I'd assumed to use Curl, but your email suggests not to?

Nope, wasn't suggesting not to. You can use many techniques, but cURL is probably the most robust. The best way to facilitate this, IMHO, is to have a rewrite rule that directs all traffic for the proxy site to your application. Then rewrite the REQUEST_URI to point to the page on the real domain. Then check your cache for the content and if empty use cURL to retrieve the content, apply your post-processing (to strip out what you don't want and apply a new page layout or whatever), then cache (if not already cached) the content (this can be a simple database table with the request URI and a timestamp), then output the content.

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux