Re: MISSes on cacheable object

Nikolai Gorchilov <niki@xxxxxxxx> · Mon, 21 Apr 2014 18:04:33 +0300

Actually I did find another reason for non-caching of this specific
URL - maximum_object_size default is 4 MB, while the object is about 6
MB.

While experimenting, I stumbled upon an undocumented requirement -
maximum_object_size MUST be placed before cache_dir, Otherwise
cacheable check fails with "too big", regardless of
maximum_object_size value if the object size is above the default 4
MB.

I'm wondering if there are more undocumented precedence/dependencies
like this, that can materially impact the cache effectiveness.

On Mon, Apr 21, 2014 at 3:35 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 21/04/2014 11:22 p.m., Timur Irmatov wrote:
>> On Mon, Apr 21, 2014 at 2:06 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
>>> On 21/04/2014 6:56 p.m., Timur Irmatov wrote:
>>>> 2014/04/21 11:46:03.940 kid1| ctx: exit level  0
>>>> 2014/04/21 11:46:03.940 kid1| store.cc(1011) checkCachable:
>>>> StoreEntry::checkCachable: NO: not cachable
>>>>
>>>> So Squid considers servers reply uncacheable. Why?
>>>>
>>>
>>> Something (unknown) has marked it to be discarded before it finished
>>> arriving. There is no sign of the store lookup logics looking up an
>>> existing entry either.
>>>  And ALL,6 trace (very big) will probaly be needed for that one.
>>
>> After clearing a cache and enabling ALL,6 trace I have performed
>> several requests through my proxy.
>>
>> Now in cache.log I do see a line "SECURITY ALERT: Host header forgery
>> detected". Indeed, guard.cdnmail.ru resolves to different IP addresses
>> sometimes.
>>
>> What are my options now? Is it possible to disable host forgery detection?
>
> No. It is done to prevent your proxy being hijacked through malicious
> web bugs corrupting the cache with infected downloads.
>
>  Imagine what would happen if one of your clients browser was delivered
> an "advert" which was actually a script that sent an HTTP request with
> URL "http://google.com/"; and fetched it directly from the IP of a server
> run by the attacker. If that response got cached all your users fetching
> Google home page from the proxy would get infected with anything the
> attacker wanted to deliver.
>
>>
>> Also, TrafficServer has on option to skip dns lookup and use remote IP
>> address from incoming client connection. Is it possible to do the
>> same? The idea is to skip double DNS lookup, one by client and one by
>> proxy server.
>
> Squid does this by default. You can see it in the logs earlier:
>
>  Found sources for 'http://guard.cdnmail.ru/GuardMailRu.exe'
>    always_direct = DENIED
>     never_direct = DENIED
>           DIRECT = local=X.X.X.X remote=217.69.139.110:80 flags=25
>
>
> The remote= value is the IP the client was connecting to. Seems to be a
> small bug in the display, it should be saying ORIGINAL_DST instead of
> DIRECT.
>
>
>>
>>> There are two other obvious things to check.
>>>
>>> The first is that this request is arriving on the tproxy port and the
>>> domain name appears to be using different IPs in geographic based
>>> responses. Is the Squid box getting the same 217.69.139.110 destination
>>> as the client was contacting?
>>
>> Yes, as I stated above.
>>
>>> The second is the storeid helper. What is its output?
>>>  debug option 84,9
>>
>> Storeid helpers does not rewrite this request in any way (replies with ERR).
>>
>
> Okay.
>
> Amos
>