Re: Data not being cached

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 22 Jun 2021 00:29:56 +1200

On 21/06/21 6:41 am, Darwin O'Connor wrote:
I run a transit prediction web app <https://www.transsee.ca/>. It 
connects to a variety of web APIs to collect the real time transit data 
it needs. The app's activities are split among many processes. They 
currently uses libcurl to connect to squid for caching (often for as 
little as 10-30 seconds) and benefits of connection sharing.

There is still cases where data isn't being cached no matter what I do. 
It is https data, but I am able to cache other https pages like 
https://cdn.mbta.com/realtime/Alerts.pb

The refresh_pattern:

refresh_pattern .               60      99999%  7200 override-expire 
override-lastmod reload-into-ims ignore-reload ignore-no-cache 
ignore-no-store ignore-private ignore-auth store-stale

Please be aware that several of those options may be causing you more 
problems than they solve:

* ignore-no-cache - no longer exists.

* ignore-reload - contradicts reload-into-ims and that can cause 
inconsistent behaviour between initial MISS and followup HIT responses.

* override-lastmod - replaces all Last-Modification (L-M) headers with 
values indicating the object is almost brand new. That prevents caches 
detecting that objects have stuck around long enough not to need replacing.
 Luckily the example response has no L-M header, so that is not the 
problem here. But it may be affecting other traffic.

The http headers from curl of an example where it is not being cached:

*   Trying 127.0.0.1:3128...
* Connected to 127.0.0.1 (127.0.0.1) port 3128 (#0)
 > GET https://api.transport.nsw.gov.au/v1/gtfs/alerts/buses HTTP/1.1
Host: 127.0.0.1:3128
User-Agent: curl/7.77.0 (+https://www.transsee.ca/)
Accept: */*
Accept-Encoding: gzip

Note this header value is used for variant selection of the responses.

Is this test value the same as what the client apps send?

Authorization: apikey 2eYEqXXxOPEDChnpeF7sZL2aR8moD2DtdNmn
Cache-Control: max-age=60
Content-Encoding: aes128gcm

Apparently the GET message contains some content. Yet there is no 
Content-Length or Transfer-Encoding to determine the length of that 
content.

* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sun, 20 Jun 2021 17:52:14 GMT
< Content-Type: application/protobuf
< Content-Length: 7455
...
< Access-Control-Allow-Credentials: true
< ETag: W/"ab70-8SI2GdBV4SJG4edSc4E5W8LBJWk"
< Vary: Accept-Encoding
< X-Cache: Hit from cloudfront
< X-Amz-Cf-Pop: SYD1-C1
< X-Amz-Cf-Id: hCoQckLsNONQMpgPr2kXJVdTDHu98jxl-rPXqV_PHB2vTCEomAd-Nw==
< Age: 35
< Access-Control-Allow-Origin: *
< Content-Encoding: gzip
< X-Cache: MISS from transsee
< X-Cache-Lookup: MISS from transsee:3128
< Via: 1.1 359a113ca166631b42f31a0f2e6a1aab.cloudfront.net (CloudFront), 
1.1 transsee (squid/4.15)
< Connection: keep-alive

Here is a sample from the Squid access log:

1624212034.891    246 127.0.0.1 59216 TCP_MISS/200 8517 GET 
https://api.transport.nsw.gov.au/v1/gtfs/alerts/buses - 
HIER_DIRECT/52.65.222.24 application/protobuf

FYI, A single request is not sufficient to demonstrate caching issues. 
Caches always require one MISS to fetch the data which then is expected 
to show up as HIT on the second or later requests.

Couple of things going on here with timing make this a MISS:

* The log timestamp says your test happened at Mon, 21 Jun 2021 12:21:11 GMT

* The response Date says it was created Sun, 20 Jun 2021 17:52:14 GMT

* The request Cache-Control forbids receiving anything from cache older 
than 60 seconds.

So, the response being many hours old cannot be delivered from a cache 
to this test request.

NP: CloudFront are a reverse-proxy service, thus allowed to ignore 
Cache-Control from clients. Which is why they respond with a HIT.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users