Re: Time for cache synchronization between siblings

Sreenath BH <bhsreenath@xxxxxxxxx> · Thu, 17 Dec 2015 17:51:28 +0530

Hi,

Thanks for the detailed response. I really appreciate it.

Unfortunately the load balancer we use is not a squid load balancer
and for now I will have to use HTCP.

Please take a look at the following lines from access.log of one of
the three squid servers.
------------
1450351827.534      0 10.135.83.129 UDP_HIT/000 0 HTCP_TST
http://127.0.0.1:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?size=xs&start=0.000&end=5.930&;
- HIER_NONE/- -

1450351827.562     20 10.135.83.129 TCP_HIT/200 553852 GET
http://127.0.0.1:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?
- HIER_NONE/- video/mp2t

1450352028.731      0 10.135.83.128 UDP_MISS/000 0 HTCP_TST
http://10.135.83.128:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?size=xs&start=0.000&end=5.930&;
- HIER_NONE/- -
--------

The first line indicates a hit when queried by a peer. Note that the
IP address is 127.0.0.1.
It was a UDP HIT and it was followed by the actual request for the
cached object, which succeeded.

Now the third line indicates UDP query for same object, except that
URL has a different IP address, and the log says it was a MISS.

I don't know what I am doing wrong, but it consistently seems to treat
the IP address as part of the URL for purpose of HIT/MISS decision.

If all requests were made from a local client(say using curl running
locally on the machine) and using 127.0.0.1 as IP address, HTCP works
correctly.

Even without HTCP, just issuing same request from localhost and
another machine using a the externally visible IP address, squid does
not appear to use cached object. I am new to HTTP and think I must be
doing something wrong, but cant say what.

I wonder if ICP would have fared better since it uses just the URL.
Might that be a reason?

thanks,
Sreenath

On 12/17/15, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 17/12/2015 3:10 a.m., Sreenath BH wrote:
>> Hi,
>>
>> Thanks for the tips. After disabling digest I believe performance
>> improved.
>> However, I found that randomly requests were being routed to parent
>> even when siblings had the data cached.
>>
>> From access.log I found TIMEOUT_CARP. I assumed this meant HTCP timed
>> out and squid was forced to go to fetch the data. So I increased
>> icp_query_timeout to 4000 milliseconds, and the hit rate increased
>> further.
>>
>> But I still find that sometimes, even after getting a HIT response
>> from a sibling, squid, for some reason still decides to go to the
>> parent for requested object.
>>
>> Are there any other reasons why squid will decide to go to parent
>> servers?
>
> Just quirks of timing I think. Squid tracks response latency and prefers
> the fastest source. If the parent is responding faster than the sibling
> for man requests over a short period then Squid might switch to using
> the parent as first choice for a
>
>
> Some traffic is also classified as "non-hierarchical". Meaning that it
> makes no sense sending it to a sibling unless all parents are down.
> Things such as CONNECT, OPTIONS, POST etc where the response is not
> possible to be cached at the sibling.
>
>
>>
>> And another question: When the hash key is computed for storing cache
>> objects, does Squid use the hostname(or IP address) also as part of
>> URL, or just the part that appears after the hostname/IP:port numbers?
>
> No. The primary Store ID/key is the absolute URL alone. Unless you are
> using the Store-ID feature of Squid to change it to some other explicit
> string value.
>
> If the URL produces a reply object with Vary header, then the expansion
> of the Vary header format is appended to the primary Store ID/key.
>
>>
>> For example: if ip address is squid servers is 10.135.85.2 and
>> 10.135.85.3, and a request made to 1st server would have had the IP
>> address as part of the URL. However, next time same request is made to
>> server2, a different IP address would be used. Does this affect cache
>> hit at the sibling server?
>>
>> I think it should not, but is this the case?
>
> Correct the Squid IP has nothing to do with the cache storage.
>
>>
>> We will have a load balancer that sends requests to each squid server,
>> and we want cache peering to work correctly in this case.
>
> FYI; the digest and HTCP algorithms you are dealing with are already
> load balancing algorithms. They are just designed for use in a flat
> 1-layer heirarchy.
>
> If you intend to have a 2-layer heirarchy (frontend LB and backend
> caches) I suggest you might want to look into Squid as the frontend LB
> using CARP algorithm. The CARP algorithm ensures deterministic storage
> locations for what URLs get sent to which caches. So there is no need
> for siblings communication as they all get unique URLs.
>
>  * <http://wiki.squid-cache.org/ConfigExamples/SmpCarpCluster> has
> details of how to split the frontend and backend config. The specific
> example is for doing it using SMP workers within a single proxy
> instance. But the split can even more easily be done across different
> machines.
>
>  * <http://wiki.squid-cache.org/ConfigExamples/ExtremeCarpFrontend> has
> some details on how to add iptables port splitting on top of CARP to get
> ridiculously high performance out of a proxy heirarchy. The last numbers
> I heard from these setups were pushing just under the Gbps mark.
>
> Amos
>
>
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users