Hi,
H Matik wrote:
Recently all of us are having problems with squid not serving certain pages/objects anymore.
We do know that squid most probably does detect correct or incorrect html codes and tells it via it's error messages.
But I am not so sure if this should be a squid task.
Squid IMO should cache and serve what it gets from the server.
The code check should be done by the browser - means incorrect code is a browser problem or a web server problem so it should be adviced by the browser not by anything in the middle.
Even if the page code is buggy the page could contain objects to be cached and that is what squid should do.
I say so because who use squid is an ISP or a system admin of any kind of network. So it should not turn into be this man's problem if somebody is coding his server's html pages incorrectly. He with his squid only serves his customers or his people on his network.
IMO this strict html code checking is complicating network support to end customers what already was or is not so easy sometimes.
We here do use transparent squid on lots of sites and soon someone complains about this kind of problem we rewrite our fwd rules so that it does not goes through squid anymore.
Even if we know that the remote site owner has no interest in somebody not capable to access his site we do not have the time to talk to him. Indeed it is not our problem and we are not a html coding school teaching how to correct errors. So here we simply desist and pass by squid for such kind of sites.
IMO I think it might be better for squid not checking code.
Custumers say: "Without your cache I can access the site, with your cache not. I do not want to know about and if you do not resolve this problem for me I do not use you service anymore but another where it works."
So even if "I" loose first my customer second they do not use squid anymore. I believe it could be considered to think about this.
I like to add that we here are using squid since 97/98 and what I wrote here is not in any kind a meant as offending critic to the developers but a point to think about. So what you think about this?
I think you've misunderstood something quite fundamental about how squid works:
Squid does not read, complain or validate HTML
In other words, it does not check it or care if it is even HTML, or if it is a binary file. Squid only cares about the HTTP _headers_ that the remote server is issuing when squid requests a document. HTTP headers have nothing to do with HTML, HTTP headers are generated by the HTTP server and administered by the server administrator, they are not anything to do with the web pages on the server itself.
I suspect you are meaning to complain about a number of different things at once:
* Strict HTTP header parsing - implemented in the most recent STABLE releases of squid, you can turn this off via a squid.conf directive anyway (but it is useful to have it set to log bad pages).
* ECN on with Linux can cause 'zero sized reply' responses, although usually you'll get a timeout. I have ECN on on my system and very few sites fail because of this, but there are a small number. Read the squid FAQ for information about how to turn this off if it is a problem.
* NTLM authentication, some uninformed site admins require or request NTLM authentication, this is not supported, not recommended by Microsoft on the internet and will not work (you'll get an error message). Squid should not support things which are known to be broken and not supposed to work!
Can you give some examples of specific sites which you need to bypass squid for that you cannot get to display using the items I mentioned above?
Reuben