On 10/09/10 04:48, Guy Bashkansky wrote:
Amos, Matus,
Some websites embed in query terms arbitrary redundant information
which is irrelevant to content distribution, but prevents effective
caching by giving same object different URLs each time.
For such websites (recognized by regex ACLs), stripping those
redundant cache-unfriendly query terms for storing provides a way of
effective caching without hurting the web functionality.
Guy
I'm well aware of this. Removing sections of URLs is a local-instance
hack that does little to solve the problem.
The last claim of it not hurting the functionality is false. It DOES
hurt the web functionality, what it doesn't hurt is your users view of it.
By "some websites" you are referring to facebook and youtube and their
like right? The YouTube storeurl_rewrite script provided in the squid
wiki needs regular updates to continue storing content without screwing
things up. That is for a site which apparently is conservative to the
point of paranoia with their changes.
WARNING: rant follows.
A real solution has to be multi-pronged:
** education for the designers of such systems about the benefits
caching provides and how to use the cache-controls in HTTP.
Unfortunately this effort is constantly undermined by administrators
everywhere trusting to "override" hacks to force caching of objects,
every time a small mistake is made by these admin it provides stronger
incentives for the website designers to force their sites as un-cacheable.
You need only look at the extreme obsessive settings sent out by
Facebook and similar sites to see where that arms race leads (Pragma,
no-cache, no-store, private, stale-0, maxage-0, expired cookies,
redirects, POST instead of GET, PUT instead of POST, WebSockets, CONNECT
tunnels, fake auth headers, expire years old, date years old, modified
decades old). ALL of it designed and implemented site-wide to prevent
the odd little truly dynamic reply amidst the static stuff being stored.
** making use of the users experience headspace. Pass on the complaints!
Users have this preference for a good time as I'm sure you know. You
as an ISP and facebook etc as providers both want two side of the same
goal: a great user experience at the website. Just because the complaint
arrives at your inbox does not mean to needs to stay there and ruin your
day. The users don't know who to complain to so they pick any email in
sight, pass it on to someone who can fix the problem properly.
I've had personal experiences with people complaining to HR
departments because their website login failed through an ISP proxy that
blocked cookies.
Both you and the rest of the Internet will benefit from the website
working even slightly better with caches. They really are the ONLY
authority on what can and can't be stored. If the website fails to do
this, they alone are the cause of their demise.
** And finally but most crucially, convincing other admins to trust
the website designer to know their own website. right or wrong its their
fault. Let them learn from the experience.
Grumbling away in the background while preventing website designers
from getting/seeing the users complaints is not going to help solve
anything.
Sorry for the rant. I've been a network admin for 12 years, webmaster
for 8, and a caching guy for the last three. So I've seen a lot of this
from all sides. It started when the marketing guys of the '90s latched
onto webserver hits being a measure of a sites success and they
strangled the web experience with it.
Amos
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.8
Beta testers wanted for 3.2.0.2