Re: Prefetching HTTP GET requests, but under HTTPS

Jianshi Huang <jianshi.huang@xxxxxxxxx> · Wed, 7 May 2014 14:01:59 +0800

Thank you Amos for the detailed reply!

I'm only going to prefetch specific pages (using regexp matching for
url patterns) that are pretty static and I'll check squid-prefetch.

Best,
Jianshi

On Tue, May 6, 2014 at 5:04 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 6/05/2014 5:53 p.m., Jianshi Huang wrote:
>> Hi,
>>
>> I need to build a prefetching proxy to speedup page loading/clicks and
>> I'm currently investigating Squid for prototype. Websites I want to
>> speedup are all under HTTPS.
>>
>> I briefly scanned Squid's document and google the keywords, looks like
>> I need to do the following setup:
>>
>> 1) Use SSL-Bump (and install Squid's cert in client's machine)
>
> Yes this is the only way to get around the HTTPS "problem".
>
>> 2) Two Squid setup, one runs prefetching script, another one does the caching.
>
> Any reason for that design?
>  Prefetching only needs three components: The cache (Squid), the logic
> deciding what to fetch (script), and possibly a database of past info to
> inform those decisions.
>
> Check out the squidclient tool we provide for making arbitrary web
> requests. It is the best tool around for scripting web requests, similar
> levels of control to libcurl which is (probably) the best for use in
> compiled code.
>
>
>> Does that make sense?
>
> Pre-fetching is a very old idea based around metrics from very old
> protocols such as HTTP/1.0 where the traffic was static, predictable and
> prefetching makes a fair bit of sense.
>
> However there are several popular features built into HTTP/1.1 protocol
> which greatly alter that balance. Dynamic content with variants makes it
> far less static. Response negotiation makes the responses far more
> unpredictable. Persistent connections greatly reduce the lag times.
> Revalidation reduces the bandwidth costs. Together these all make
> prefetching in HTTP/1.1 a much less beneficial operation than most of
> the literature makes it seem.
>
> Whether it makes sense depends entirly on where it is being installed,
> what the traffic is like, how and why the prefetching decisions are
> being made. Only you can really answer those and it may actually take
> doing to figure out whether it was a bad choice to begin with.
>
>
>> Is there a better solution?
>
> At the current point of Internet development I believe throwing efforts
> into assisting us with HTTP/2 development would be more beneficial. But
> I am somewhat biased being a member of the HTTPbis WG and seeking to get
> Squid HTTP/2 support off the ground.
>
>
>> Or has anybody done similar things?
>
> We get a fairly regular flow of questions from people wanting to do
> pre-fetching. They all hit that above issues eventually and drop out of
> sight.
>
> This thread summarizes teh standard problems and answers:
> http://arstechnica.com/civis/viewtopic.php?f=16&t=1204579
> (see fandingo's answer near the bottom)
>
> I am aware of this tool being used by more than a few dozen
> installations, although its popularity does seem to be in decline:
> https://packages.debian.org/unstable/web/squid-prefetch
>
>
>>
>> It would be great if someone could point out some
>> configuration/scripting files to me. Code speaks :)
>>
>
> Everythig in this regard is situation dependent. The above are likely to
> be the best you can find. People who actually get it going (apparently
> anyway) keep the secrets of how to avoid the HTTP/1.1 issues pretty close.
>
> HTH
> Amos
>

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/