Guillaume Smet wrote:
On 3/23/07, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
Looks like a case for something like this that prevents the group
'robots' from retrieving data not already in the cache:
acl robots <....>
always_direct deny robots
No, that's not what I want. It's not a problem for us that robots
index all the content of our website. I just want them to not put
garbage into our cache.
So they should be able to access every page of the site, using cache
or not, but they shouldn't be able to put the generated pages in the
cache so that they don't pollute the cache.
Still, I would pose you a question:
if people find and visit your page by going to a search engine how
can they find useful pages that nobody else has visited recently??
I agree. That's why it's not what I'm asking for :).
Thanks for your help.
--
Guillaume
ah, now I understand.
This is a problem for your web server configuration then. Your cache and
others around the world can be expected to cache any content that they
are allowed to.
The best way to prevent this content being cached is for the originating
web server to mark it as non-cachable using "Pragma: no-cache" and
"Cache-Control: no-cache"
You can find info on them here
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32
Amos