Re: FW: Encrypted browser-Squid connection errors

squid3@xxxxxxxxxxxxx · Mon, 31 Oct 2022 01:59:00 +1300

On 2022-10-23 06:10, Grant Taylor wrote:
On 10/21/22 11:30 PM, Amos Jeffries wrote:
Not just convention. AFAICT was formally registered with W3C, before 
everyone went to using IETF for registrations.

Please elaborate on what was formally registered.  I've only seen 3128 
/ 3129 be the default for Squid (and a few things emulating squid).  
Other proxies of the time, namely Netscape's and Microsoft's 
counterparts, tended to use 8080.

I'd genuinely like to learn more about and understand the history / 
etymology / genesis of the 3128 / 3129.

Duane W. would be the best one to ask about the details.

What I know is that some 10-12 years ago I discovered an message by 
Duane mentioning that W3C had (given or accepted) port 3128 for Squid 
use. I've checked the squid-cache archives and not seeing the message.

Right now it looks like the W3C changed their systems and only track the 
standards documents. So I cannot reference their (outdated?) protocol 
registry :-{ . Also checked the squid-cache archives and not finding it 
email history. Sorry.

FYI, discussion started ~30 years ago.

ACK

The problem:

For bandwidth savings HTTP/1.0 defined different URL syntax for origin 
and relay/proxy requests. The form sent to an origin server lacks any 
information about the authority. That was expected to be known 
out-of-band by the origin itself.

HTTP/1.1 has attempted several different mechanisms to fix this over 
the years. None of them has been universally accepted, so the problem 
remains. The best we have is mandatory Host header which most (but 
sadly not all) clients and servers use.

HTTP/2 cements that design with mandatory ":authority" pseudo-header 
field. So the problem is "fixed"for native HTTP/2+ traffic. But until 
HTTP/1.0 and broken HTTP/1.1 clients are all gone the issue will still 
crop up.

I'm not entirely sure what you mean by "the authority".  I'm taking it 
to mean the identity of the service that you are wanting content from. 
The Host: header comment with HTTP/1.1 is what makes me think this.

I mean "authority" as used by HTTP specification, which refers to 
https://www.rfc-editor.org/rfc/rfc3986#section-3.2

My understanding is that neither HTTP/0.9 nor HTTP/1.0 had a Host: 
header and that it was assumed that the IP address you were connecting 
to conveyed the server that you were wanting to connect to.

Yes exactly. That is the source of the problem, perpetuated by the need 
to retain on-wire byte/octet backward compatibility until HTTP/2 changed 
to binary format.

Consider what the proxy has to do when (not if) the IP:port being 
connected to are that proxy's (eg localhost:80) and the URL is only a 
path ("/") on an origin server somewhere else. Does the "GET / HTTP/1.0" 
mean "http://example.com/"; or "http://example.net/"; ?

More importantly the proxy hostname:port the client is opening TCP 
connections to may be different from the authority-info specified in 
the HTTP request message (or lack thereof).

My working understanding of what the authority is seems to still work 
with this.

The key point is that the proxy host:port and the origin host:port are 
two different authority and only the origin may be passed along in the 
URL (or URL+Host header). When the client uses port 80 and 443 thinking 
they are origin services it is *required* (per 
https://www.rfc-editor.org/rfc/rfc9112.html#name-origin-form) to omit 
the real origins info. Enter problems.

This crosses security boundaries and involves out-of-band information 
sources at all three endpoints involved in the transaction for the 
message semantics and protocol negotiations to work properly.

I feel like the nature of web traffic tends to frequently, but not 
always, cross security / administrative boundaries.  As such, I don't 
think that existence of proxies in the communications path alters 
things much.

Please elaborate on what out-of-band information you are describing. 
The most predominant thing that comes to mind, particularly with 
HTTP/1.1 and HTTP/2 is name resolution -- ostensibly DNS -- to identify 
the IP address to connect to.

I refer to all the many ways the clients may be explicitly or implicitly 
configured to be aware that it is talking to a proxy - such that it 
explicitly avoids sending the problematic origin-form URLs.

What that text does not say is that when they are omitted by the 
**user** they are taken from configuration settings in the OS:

  * the environment variable name provides:
     - the protocol name ("http" or "HTTPS", aka plain-text or 
encrypted)
     - the expected protocol syntax/semantics ("proxy" aka 
forward-proxy)

  * the machine /etc/services configuration provides the default port 
for the named protocol.

Ergo the use of /default/ values when values are not specified.

The defaults though are tuned for origin server (or reverse-proxy) 
direct contact.
No Browser I know supports 
"http-alt://proxy.example.com?http://origin.example.net/index.html"; 
URLs.

I feel like this in a round about way supports my stance that the 
default ports are perfectly fine to use.

... "at your own risk" they technically might be. So long as you only 
receive one of the three types of syntax there - port 80/443 being 
officially registered for origin / reverse-proxy syntax.

Attempting to use a reverse-proxy or origin server such a 
configuration may work for some messages, but **will** fail due to 
syntax or semantic errors on others.

I question the veracity of that statement.

It is based on experience. Squid used to be a lot more lenient and tried 
for decades to do the syntax auto-detection. The path from that to 
separate ports is littered with CVEs. Most notably the curse that keeps 
on giving: CVE-2009-0801, which is just the trigger issue for a whole 
nest of bad side effects.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users