Re: Vary object loop returns

Heiler Bemerguy <heiler.bemerguy@xxxxxxxxxxxxxx> · Tue, 7 Jun 2016 19:27:49 -0300

    I changed the source to debug a bit, started a fresh/clean squid.
    accessed http://api.footballaddicts.com/favicon.ico
    and look what it's trying to compare():
    vary =                                   
'accept="text%2Fhtml,application%2Fxhtml+xml,application%2Fxml%3Bq%3D0.9,*%2F*%3Bq%3D0.8",
      if-none-match="%225756ad27-47e%22",
      if-modified-since="Tue,%2007%20Jun%202016%2011%3A16%3A55%20GMT",
      accept-language="en-US,en%3Bq%3D0.8,pt-BR%3Bq%3D0.5,pt%3Bq%3D0.3",
      accept-encoding="none", x-client-locale,
user-agent="Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64%3B%20rv%3A46.0)%20Gecko%2F20100101%20Firefox%2F46.0",
      x-device'

      entry->mem_obj->vary_headers =          
'accept="text%2Fhtml,application%2Fxhtml+xml,application%2Fxml%3Bq%3D0.9,*%2F*%3Bq%3D0.8",
      if-none-match, if-modified-since,
      accept-language="en-US,en%3Bq%3D0.8,pt-BR%3Bq%3D0.5,pt%3Bq%3D0.3",
      accept-encoding="none", x-client-locale,
user-agent="Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64%3B%20rv%3A46.0)%20Gecko%2F20100101%20Firefox%2F46.0",
      x-device'
    That's why it always gives vary object loop

    -- 
Best Regards,

Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751
    Em 07/06/2016 17:21, Heiler Bemerguy
      escreveu:

      Some servers will reply like this, trying to avoid caching at
        any cost (I think):
      HTTP/1.1 200 OK

        Server: nginx

        Content-Type: image/x-icon

        Last-Modified: Tue, 07 Jun 2016 11:16:55 GMT

        ETag: "5756ad27-47e"

        Content-Length: 1150

        X-Suppressed-Cache-Control: max-age=600

        Cache-Control: private, max-age=0, must-revalidate

        X-Suppressed-Expires: Tue, 07 Jun 2016 20:07:36 GMT

        Expires: Thu, 01 Jan 1970 00:00:00 GMT

        Date: Tue, 07 Jun 2016 19:57:36 GMT

        X-Varnish: 510207311

        Vary:
Accept,If-None-Match,If-Modified-Since,Accept-Language,Accept-Encoding,X-Client-Locale,User-Agent,X-Device
      Then our squid will create a vary object with all that
        information, giving this bomb: httpMakeVaryMark:
        accept="image%2Fpng,image%2F*%3Bq%3D0.8,*%2F*%3Bq%3D0.5",
        if-none-match="%225756ad27-47e%22", if-modified-since,
        accept-language="en-US,en%3Bq%3D0.8,pt-BR%3Bq%3D0.5,pt%3Bq%3D0.3",
        accept-encoding="none", x-client-locale,
user-agent="Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20WOW64%3B%20rv%3A46.0)%20Gecko%2F20100101%20Firefox%2F46.0",
        x-device
      It's squid "fault" to convert spaces and symbols to %values,
        and I think no sanity check is performed on it.. still, I don't
        see the code where it checks if all this info from the new
        client is identical to the stored one.. and I don't know where
        the "loop" comes from...
      Now I think I'm confused... lol

      -- 
Best Regards,

Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751

      Em 07/06/2016 08:59, Yuri Voinov
        escreveu:

        -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I want to give one example on the topic.

Here is from one of my cache:

/data/cache/d2/00/02/000004C3   0   102502
http://www.openoffice.org/favicon.ico
/data/cache/d2/00/01/0000031D   0   161421
http://rgho.st.squidinternal/favicon.ico
/data/cache/d1/00/2E/00005C04   0    33274
http://www.tcpiputils.com/favicon.ico

Just take a look on file sizes. This is only favicon. 100 kbytes for
favicon only! (on Microsoft I've seen 470 kbytes favicon once upon time).

When we take a look into access.log, we often see several URL's for favicon:

http://www.somesite.com/favicon.ico?v=1.44&id=41324134abcd123123123

Good site, isn't it? Loading 100 kbytes every time every client surf any
site page.

When I was doing research, it became clear that, in most cases, these
same favicon were one and the same content. As an example, a client with
a smartphone like to download 100 kB - and this is only a small portion
of the page! - everytime?

100 kb of mobile data traffic in most countries of the world - decent money.

Yes, usually from the client browser cache.

What about the number of clients and the access point, which pays
terabytes non-peering traffic?

The same tricks I've seen with a user-agent. With Vary.

07.06.2016 16:36, Amos Jeffries пишет:

          On 7/06/2016 8:48 p.m., Yuri Voinov wrote:

            07.06.2016 4:57, Amos Jeffries пишет:

              On 7/06/2016 5:55 a.m., Yuri Voinov wrote:

                So.

Squid DOES NOT and DON'T BE support gzip. The only way to do it - use
ecap + desupported ecap gzip adapter. Let's accept this. We can support
gzip. With restrictions. Ok.

any other compression - false. No. No way. Get out. and so on.

 identity - this is uncompressed type.

That's all, folks.

Finally. As Joe does, we can remain only gzip and identity in
Accept-Encoding and truncate all remaining.

              Locking the entire Internet to using your personal choice of gzip
compression or none.

              gzip is the slowest and more resource hungry type of compression there
is. deflate is actually faster for clients and just as widely supported.

            Unfortunately, Amos, no one has written any other compression algorithms
support module. We have to eat what they give.

          Like I said deflate is widely available. Heiler's recent info shows that
lzma is becomming more visible on the public web, which should help fix
the one issue deflate has.

And noone appears to be fixing the remaining issues in the Squid gzip
eCAP module.

There also seems to be a big push back from browser and some server
vendors about compression in general. We had a fairly major fight in
IETF to get HTTP/2 to contain data compression at all. It is still only
in there as an optional extension that some are openly refusing to
implement.

                Without any problem. Moreover, this type of can be push to all brunches
of squid without any problem, because of this dramatically increases
byte HIT.

              Responding with a single object to all requests makes your HIT ratio
100% guaranteed. The clients wont like you though if all they ever see
is the same cat picture.

              It sounds ridiculous when put that way, but that is what these patches
are doing for a unknown number of those "gained" HITs. See my previous
post about how none of these patches are changing the request the server
gets.

            But no one asked the question - why Squid in production installations
has such a low hit ratio

          Yes that has been asked, even investigated. The reason(s) are many
complex details and small issues adding together to a big loss.

They range from protocol things like Vary not being fine-grained enough
(Key header being developed fixes that), through to client behaviour
(Chrome sdch doubles the variant count - almost halving useful cache
space), to server behaviour (Apache changing Vary header).

What your testing of joes patches is showing is that the sdch effect
Chrome has is probably way bigger than one would expect to be reasonable.

            that raises the question of expediency of
application caching proxy. We do believe that this is a caching proxy?

              You are once again sweeping asside the critical requirement of content
integrity to achieve high HIT ratio. Which is not something that I can
accept into Squid as a default action.

            I continue to believe that 20% is unacceptably low cache hit ratio,
given the very aggressive settings and the active use of Store ID. Which
brings us back to the idea of the feasibility of using the SQUID as a

        whole.

          That kind of "unacceptable" statement simply cannot be made about cache
HIT ratio. It is what it is. One cannot change the speed of light
because it takes unacceptable long to travel through space.

Two properly working caches in serial will have extremely different
caching ratios. The one with most direct client connections trends
towards 50-100% and the upstream one towards the servers will trend
towards zero. The total cacheable ratio is unchanged, but each cache
sees a different proportion of it and so shows different HIT ratios
relative to their clients portion.

Also, don't forget that browser cache disk space available are
increasingly large as well. So their caches are growing in size and
taking up a larger share of the total achievable HIT ratios in recent

        years.

          Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users

        -----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJXVrctAAoJENNXIZxhPexGl8gIALRSaB3nC6fUjKM8GGL+ep3m
NZganwbvtkLLLDHQFuTA3K9gvl/GWieQ/3jj+Pp45kgNIeVNsbwYF6IANOT1/olc
XIGpHK0LICSeTA5kpSHU6hkdfao6AWSUFLci5WXl/Ay7qvzWI4h/NqPhyhoaJUSq
LTmOePc98oALu4oZpmdmKy1D5yduLmjDy8cbIJTRc/SVha5tt4Sre7z8dI9geX9L
PlrXBxbtH+oGAYu5qiuifQR9UZCoYL0wL30KzWLyIqmZJdT/NIshIRA1wHVdy9lL
d0CNwheIPTvstnx8uKOMk4vN/Z5y+A6LnTHHoJgfRCyNwD1IayoPRY1CJffWVRk=
=40f2
-----END PGP SIGNATURE-----

        _______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users

      _______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users