On Sat, 5 Mar 2005, H Matik wrote:
Recently all of us are having problems with squid not serving certain pages/objects anymore.
Examples please.
We do know that squid most probably does detect correct or incorrect html codes and tells it via it's error messages.
But I am not so sure if this should be a squid task.
It isn't, and Squid does none of the kind. Squid could not care less about what is HTML. To Squid a HTML page is just a sequence of characters of no meaning to Squid.
As Reuben said Squid only cares about the validity of the HTTP protocol, and the things it cares about is for good reasons (mostly security). It is known that there is several quite broken web sites out there which will not work via 2.5.STABLE9, and due to the nature of the bugs in these sites it is unlikely they will work with any later Squid releases until the site administrator fixes their critical server bugs.
Squid IMO should cache and serve what it gets from the server.
And this is what Squid does. The server must however speak the HTTP protocol in a somewhat meaningful dialect for Squid to understand what the server says and not reject it as a hacker attempt or other malicious intent.
The code check should be done by the browser - means incorrect code is a browser problem or a web server problem so it should be adviced by the browser not by anything in the middle.
And this is exacly how it is.
We here do use transparent squid on lots of sites and soon someone complains about this kind of problem we rewrite our fwd rules so that it does not goes through squid anymore.
You complain all this about what a proxy should or should not do, and still you intentinally and focibly violate the fundamentals of TCP/IP by hijacking your users requests? Transparent interception violates Internet Standard #3 "Requirements for Internet hosts" and also the general spirit of the design of TCP/IP.
IMO I think it might be better for squid not checking code.
There is sertain things Squid must check in the HTTP protocol used for transferring the HTML code. But Squid absolutely does NOT care about the HTML or other contents of the requested site.
Custumers say: "Without your cache I can access the site, with your cache not. I do not want to know about and if you do not resolve this problem for me I do not use you service anymore but another where it works."
Unfortunately the world is not so unambigious.
It may be worth mentioning that many of the sites failing with Squid 2.5.STABLE9 is likely to start failing with newer browsers as well for the same reasons Squid pukes on these sites.
So even if "I" loose first my customer second they do not use squid anymore. I believe it could be considered to think about this.
I belive the 2.5.STABLE9 release has a very good balance in this.
Sure, there may still be a few buggy web servers out there where Squid could safely work around the server bugs, but each of these has to be analyzed very carefully individually.
In addition the only way of getting this done is to spend some time on identifying why Squid rejects the responses from a certain site, and then open a discussion here on squid-users on how Squid maybe could work around that broken web server.
If you can/will not investigate why problems arises but still expects everything to work then you should have a support contract, either for Squid from one of the Suqid support providers or for a commercial proxy/cache if you prefer.
Just complaining without any information won't get you anywhere, except perhaps blacklisted in some of the subscribers here.
I like to add that we here are using squid since 97/98 and what I wrote here is not in any kind a meant as offending critic to the developers but a point to think about. So what you think about this?
And beleive me, we think very careful about these things.
If we did not then Squid-2.5.STABLE8 would have been released with the HTTP parser in it's very strictest setting, i.e. the equivalence of 2.5.STABLE9 configured with "relaxed_header_parser off" and in addition yelling a screenful of complaints per request in cache.log on each malfunctioning web server seen.
Regards Henrik