This is a terrible idea. Details below. On 20/01/2013 9:44 a.m., carteriii wrote:
I put together the textual diagram below to help describe to someone what I am trying to. I thought I'd include it here in case it gives someone else any ideas. The greater-than or less-than symbols indicate the direction that information is flowing. These first two lines show what normally happens today. The client requests a specific page. That request goes to Squid (acting as a reverse proxy) which in turn passes that same request onto the appropriate Server. The "Server" then responds with a 301/302 and the location that should be used instead. This response is passed through Squid to the client. Client > > (req: http://page) > > Squid > > (req: http://page) > > > > Server Client < < (resp: 301/location) < < Squid < < (resp: 301/location) < < < Server This next diagram shows what I would like to have happen. The first line is exactly the same. The second & third lines show that when the 301 response from the server gets to Squid, Squid would turn around and request the new "location" from another server. Only when the response comes back from that server will Squid return that response to the client. The client is never aware that a 301 or 302 occurred.
Meaning that the client is not aware what URL location it has been passed and needs to cache that response under. All the client has is the original request URL. So Squid has just performed a cache poisoning attack on the client, corrupting their future HTTP requests to that URL and any relative-URL snippets embedded in the object.
* In HTTP there is the likely possibility that the client is another proxy closer to the actual end-user. Resulting in more than just one end-user receiving the corrupted caches responses. Free attack amplification from one visitor to entire network.
* In HTTP there is the possibility that some third-party intermediary is who responded with 30x. For example a pay-as-you-go gateway which suddenly needs more payment will 30x redirect to the payment for even is http://google.com/ was requested. Innocent ISP doing the Right Thing(tm) with a 30x to separate itself from original requested site are suddenly seen as hijacking traffic in networks well beyond their gateway.
* In HTTP the server is influential in determining caching time for its response. Meaning that when an actual attack takes place using 30x to inject cache poison the corruption will stick around in cache long after the actual attacker disappeared. The effect is the same as replacing pages on some website with your own copies, but where the website host is completely unable to see the change or wipe the disk files back to uncorrupted copies.
So as you should be able to see, it is extremely unsafe and a Bad Idea(tm) to do this anywhere outside the end-users own browser cache.
Beyond the poisoning aspect you are also injecting into the network towards the client all the other annoying problems which are documented for URL-rewrite. There is a fairly long laundry list of reasons why automatic 30x following and other forms of URL-rewrite should not be done by a shared proxy or intermediary device. The above poisoning consequences are just the #1 most important.
NP: The URL-rewrite and store-URL feature in Squid are only possible because there is a) only one request involved with the re-write at any one time and b) the Squid administrator is directly in charge of the URL merging, not some third party (possible attacker) supplied 30x information. Even so they are still dangerous features and can lead to cache poisoning unless used sparingly and targeted with great precision.
Amos