Search squid archive

Re: Re: How to ignore query terms for store key?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/09/10 04:48, Guy Bashkansky wrote:
Amos, Matus,

Some websites embed in query terms arbitrary redundant information
which is irrelevant to content distribution, but prevents effective
caching by giving same object different URLs each time.

For such websites (recognized by regex ACLs), stripping those
redundant cache-unfriendly query terms for storing provides a way of
effective caching without hurting the web functionality.

Guy


I'm well aware of this. Removing sections of URLs is a local-instance hack that does little to solve the problem.

The last claim of it not hurting the functionality is false. It DOES hurt the web functionality, what it doesn't hurt is your users view of it.

By "some websites" you are referring to facebook and youtube and their like right? The YouTube storeurl_rewrite script provided in the squid wiki needs regular updates to continue storing content without screwing things up. That is for a site which apparently is conservative to the point of paranoia with their changes.


WARNING: rant follows.


A real solution has to be multi-pronged:

** education for the designers of such systems about the benefits caching provides and how to use the cache-controls in HTTP.

Unfortunately this effort is constantly undermined by administrators everywhere trusting to "override" hacks to force caching of objects, every time a small mistake is made by these admin it provides stronger incentives for the website designers to force their sites as un-cacheable.

You need only look at the extreme obsessive settings sent out by Facebook and similar sites to see where that arms race leads (Pragma, no-cache, no-store, private, stale-0, maxage-0, expired cookies, redirects, POST instead of GET, PUT instead of POST, WebSockets, CONNECT tunnels, fake auth headers, expire years old, date years old, modified decades old). ALL of it designed and implemented site-wide to prevent the odd little truly dynamic reply amidst the static stuff being stored.


 ** making use of the users experience headspace. Pass on the complaints!

Users have this preference for a good time as I'm sure you know. You as an ISP and facebook etc as providers both want two side of the same goal: a great user experience at the website. Just because the complaint arrives at your inbox does not mean to needs to stay there and ruin your day. The users don't know who to complain to so they pick any email in sight, pass it on to someone who can fix the problem properly.

I've had personal experiences with people complaining to HR departments because their website login failed through an ISP proxy that blocked cookies.

Both you and the rest of the Internet will benefit from the website working even slightly better with caches. They really are the ONLY authority on what can and can't be stored. If the website fails to do this, they alone are the cause of their demise.


** And finally but most crucially, convincing other admins to trust the website designer to know their own website. right or wrong its their fault. Let them learn from the experience.

Grumbling away in the background while preventing website designers from getting/seeing the users complaints is not going to help solve anything.


Sorry for the rant. I've been a network admin for 12 years, webmaster for 8, and a caching guy for the last three. So I've seen a lot of this from all sides. It started when the marketing guys of the '90s latched onto webserver hits being a measure of a sites success and they strangled the web experience with it.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.8
  Beta testers wanted for 3.2.0.2


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux