Re: Identifying the source of Invalid-request (squid 3) -> error:transaction-end-before-headers (Squid 4)

Jester Purtteman <jester@xxxxxxxxxxx> · Sat, 15 Oct 2016 16:16:55 -0700

Thanks for the reply Alex, I've embedded some comments below, and I will 
get back to you with additional info after some testing.

On 10/15/2016 10:57 AM, Alex Rousskov wrote:
On 10/15/2016 07:36 AM, Jester Purtteman wrote:
I have been seeing lines in my access log like the following:

1476535967.570      0 xxx.xxx.xxx.xxx TAG_NONE/400 4538 NONE
error:invalid-request - HIER_NONE/- text/html

After some digging on this list I began to suspect websockets or other
non-http traffic coming across port 80.  I decided to try squid
4.0.15 with on-unsupported-protocol.  I get what I am guessing to be the
same result with new error text around it:

1476536369.742      0 xxx.xxx.xxx.xxx NONE/000 0 NONE
error:transaction-end-before-headers - HIER_NONE/- -

An interesting point to interject here is that my “Hits as % bytes sent”
in 3.5.x has always been in the 2 to 5% range, but there are periods
(sometimes long ones) where the inbound traffic to squid is much higher
than the outbound.  When I switch to 4.0.x, I am now running about -27%
(note, negative twenty-seven) as bytes, which makes me suspect it is
logging the higher inbound than outbound now.
That difference sounds potentially important to me. I encourage you to
figure out what causes it (which is exactly what you have started doing,
of course).

So, apparently, this
unsupported protocol is triggering some sort of large download, but does
not end up going to the client.
There might be some exceptions, but non-tunneled
error:transaction-end-before-headers are not supposed to trigger a
download. Squid does not know what to download because Squid cannot
parse the request...

When enabled, tunneled error:transaction-end-before-headers do download
data, of course, but you may be able to measure how much they download
then by finding the corresponding transactions in access log.

I would like to know a couple things, first: is there some debugging
level other than ALL,9 that might give me some illumination?
I am sure there is. Once you know what the problem/cause is, it is easy
to come up with the corresponding optimal debug_options settings to show
the cause. Before that? You can try various settings (debugging sections
are semi-documented in doc/debug-sections.txt), but it is often not
worth your time.

ALL,9
generates about 15 MB of debug log per second at my current load level,
and these errors aren’t real frequent, so I end up with ~ 400 MB of text
that needs to be sifted through.  As you can imagine, that can be a bit
brutal.
I do not quite understand the problem of a 400MB ALL,9 cache.log. IMHO,
it is not much more difficult to deal with than a 1MB ALL,9 cache.log:
Either you can navigate ALL,9 noise or you cannot; the total log size
does not really matter much beyond a few MB levels IMO (provided you
have enough disk space to store it and logging itself does not slow
Squid down too much to reproduce the problem).

Please note that I am not saying that you are doing something wrong or
even complaining about a non-problem. I am only saying that I do not see
a [solvable in the context of this email thread] problem with ALL,9 logs
so I cannot help you solve it.

I gotcha, I'll start the digging.  I was curious what it would take to 
get the a dump of what the request looked like, but its some non-HTTP, 
so that probably doesn't make sense anyway

So, I have a few questions I guess:

(1)    For one thing, what are the implications of
“on_unsupported_protocol tunnel all”?
In rough terms, everything that is not SSL or HTTP will be tunneled.
AFAIK, non-HTTP inside SSL will not be tunneled yet (there is an
important patch for that going through squid-dev review right now).

Okay, that i think is the desired behavoir in my case, I basically want 
squid to handle the requests that it knows how deal with, and ignore and 
pass along the ones it cannot.  The environment is pretty promiscuous, 
and i don't need to restrict clients from using non-http stuff.
I did it as a quick attempt to
see if that had any new and interesting impacts, but is it safe-ish?
I do not know what you mean by "safe", but, in a sense, it is more
"safe" than having no proxy at all because your access.log will show you
those tunnels.
Safe as in, not causing security holes.

Am I letting the bad-guys come pouring through with that?
I personally do not know -- in general, it depends on the bad guys in
your environment. Others here may have deployment-specific stories that
I lack.

I was concerned that by permitting connects through the proxy there may 
be security concerns, but i have the thing bolted down so that only our 
clients can get to it, and it doesn't have much access to anywhere else 
on the network, so I am thinking that is probably fine.
(2)    What debug levels should I be thinking about to try and figure
out what is happening.  Seems like we won’t get very far without
identifying what is throwing that error.
If you do not want to deal with ALL,9, I would recommend this combination:

* a packet capture (you can limit the captured packet size if needed)
* access log format that logs all IPs and all TCP ports so that you can
match an access log line with captured packets/connection.

I'll just suck it up and do an ALL,9 capture.  It will take me a couple 
tries to get a capture that includes the problem, but it isn't a huge 
problem.  The packet capture idea is a good one too, I'll do that as 
well.  Similar issue (sifting a small amount of info out of an ocean of 
data) but I think valuable.
(3)    Has anyone else seen this?  Right now, for example (after 10
minutes of typing an email) I’m actually running -61% Hits as Bytes!
(Negative!)  Ouch!
As I said earlier, I am not sure the negative byte hit ratio is actually
related to these errors, but it could be. Squid v4 fixed a few
size-related accounting bugs. It is possible that we screw something up
in the process or that your actual byte hit ratio was always bad (but
you did not know about it because Squid was lying to you).

Can you compare Squid-reported numbers with OS/interfaces-reported
numbers somehow? If OS/interface numbers confirm v3.5 report but not
v4.0 report, then there is a bug we need to fix.

I can, i have interface statistics that I should be able to get pretty 
close with.  It won't be perfect (my ssh session will throw it off a 
tad, etc) but it will be able to detect a 30%+ problem without issue.  
I'll do some experiments and share the result.
HTH,

Alex.

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users