Search squid archive

Re: a bit off topic. New user question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/05/17 11:39, George Diaz wrote:

Hi

sorry this off-topic question ...

I want pre-cache some object from some interest host with wget.

My question is : I want: the wget download the object to the /dev/null but
I'm not found this switches....
(GNU Wget 1.5.3)

I'm probe this :
export http_proxy=http://mycache.com:8080/
wget -r http://sobredinero.com -P /dev/null -nH -nd -Y on -b -l5 -t1 -o /dev/null

but this is create a /dev/null directory :) and download the files into this.

any suggestions ?


Some advice before you get too far into this project;

Pre-caching was an good idea back in the days of HTTP/1.0 and static websites where the URL was all that mattered. In todays HTTP/1.1 and HTTP/2 world dynamic content and variants are much more common things, and both make pre-caching pretty much useless.

Before you attempt it for any domain I recommend passing a few of its URLs through the tool at <https://redbot.org>. If that tool indicates the site uses content negotiation or conditional HTTP features then pre-caching is just going to be causing problems.

For example; that sobredinero domain above produces these details:


     Content Negotiation

 * The response body is different when content negotiation happens.


     Caching

 * Vary: User-Agent can cause cache inefficiency.


This means that anything you pre-cache with wget will be ignored and probably replaced when any non-wget agent (ie a browser) is used to fetch through the proxy. So you just waste all the bandwidth, time, and storage space used pre-caching it.


Vary:User-Agent is particularly bad since any single character difference in the User-Agent header will cause a different object to be referred to in the cache storage. If you wish to pre-cache these objects in any useful way you have to know and mimic the *exact* User-Agent header values that will be used to fetch it. For example; two different version of Chrome -> different User-Agent header. Internet Explorer with different Windows Updates applied -> different User-Agent header. As you can imagine that is a very hard thing to predict.


Note: if you have come to this idea after seeing objects from that domain getting a lot of MISS records, the problem is very much that Vary header causing so many different objects to be needed that objects being stored are often not the right one(s) for any later client request. pre-caching will not solve this but make it worse as wget is just another different User-Agent.


Amos


_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux