Re: Not all html objects are being cached

Garri Djavadyan <garryd@xxxxxxxxx> · Fri, 27 Jan 2017 18:35:46 +0500

On Fri, 2017-01-27 at 17:58 +0600, Yuri wrote:
> 
> 27.01.2017 17:54, Garri Djavadyan пишет:
> > On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
> > > --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
> > > Connecting to 127.0.0.1:3128... connected.
> > > Proxy request sent, awaiting response...
> > >     HTTP/1.1 200 OK
> > >     Cache-Control: no-cache, no-store
> > >     Pragma: no-cache
> > >     Content-Type: text/html
> > >     Expires: -1
> > >     Server: Microsoft-IIS/8.0
> > >     CorrelationVector: BzssVwiBIUaXqyOh.1.1
> > >     X-AspNet-Version: 4.0.30319
> > >     X-Powered-By: ASP.NET
> > >     Access-Control-Allow-Headers: Origin, X-Requested-With,
> > > Content-
> > > Type,
> > > Accept
> > >     Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
> > >     Access-Control-Allow-Credentials: true
> > >     P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD
> > > TAI
> > > TELo
> > > OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
> > >     X-Frame-Options: SAMEORIGIN
> > >     Vary: Accept-Encoding
> > >     Content-Encoding: gzip
> > >     Date: Fri, 27 Jan 2017 09:29:56 GMT
> > >     Content-Length: 13322
> > >     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > >     Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > >     Strict-Transport-Security: max-age=0; includeSubDomains
> > >     X-CCC: NL
> > >     X-CID: 2
> > >     X-Cache: MISS from khorne
> > >     X-Cache-Lookup: MISS from khorne:3128
> > >     Connection: keep-alive
> > > Length: 13322 (13K) [text/html]
> > > Saving to: 'index.html'
> > > 
> > > index.html          100%[==================>]  13.01K --.-
> > > KB/s    in
> > > 0s
> > > 
> > > 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved
> > > [13322/13322]
> > > 
> > > Can you explain me - for what static index.html has this:
> > > 
> > > Cache-Control: no-cache, no-store
> > > Pragma: no-cache
> > > 
> > > ?
> > > 
> > > What can be broken to ignore CC in this page?
> > 
> > Hi Yuri,
> > 
> > 
> > Why do you think the page returned for URL
> > [https://www.microsot.cpom/r
> > u-kz/] is static and not dynamically generated one?
> 
> And for me, what's the difference? Does it change anything? In
> addition, 
> it is easy to see on the page and even the eyes - strangely enough -
> to 
> open its code. And? What do you see there?

I see an official home page of Microsoft company for KZ region. The
page is full of javascripts and products offer. It makes sense to
expect that the page could be changed intensively enough.

> > The index.html file is default file name for wget.
> 
> And also the name of the default home page in the web. Imagine - I
> know 
> the obvious things. But the question was about something else.
> > 
> > man wget:
> >    --default-page=name
> >         Use name as the default file name when it isn't known
> > (i.e., for
> >         URLs that end in a slash), instead of index.html.
> > 
> > In fact the https://www.microsoft.com/ru-kz/index.html is a stub
> > page
> > (The page you requested cannot be found.).
> 
> You living in wrong region. This is geo-dependent page, as obvious,
> yes?

What I mean is the pages https://www.microsoft.com/ru-kz/ and https://w
ww.microsoft.com/ru-kz/index.html are not same. You can easily confirm
it.

> Again. What is the difference? I open it from different
> workstations, 
> from different browsers - I see the same thing. The code is
> identical. I 
> can is to cache? Yes or no?

I'm a new member of Squid community (about 1 year). While tracking for
community activity I found that you can't grasp the advantages of
HTTP/1.1 over HTTP/1.0 for caching systems. Especially, its ability to
_safely_ cache and serve same amount (but I believe even more) of the
objects as HTTP/1.0 compliant caches do (while not breaking internet).
The main tool of HTTP/1.1 compliant proxies is _revalidation_ process.
HTTP/1.1 compliant caches like Squid tend to cache all possible objects
but later use revalidation for dubious requests. In fact the
revalidation is not costly process, especially using conditional GET
requests.

I found that most of your complains in the mail list and Bugzilla are
related to HTTPS scheme. FYI: The primary tool (revalidation) does not
work for HTTPS scheme using all current Squid branches at the moment.
See bug 4648.

Try to apply the proposed patch and update all related bug reports.

HTH

Garri
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users