On 21/06/21 6:41 am, Darwin O'Connor wrote:
I run a transit prediction web app <https://www.transsee.ca/>. It
connects to a variety of web APIs to collect the real time transit data
it needs. The app's activities are split among many processes. They
currently uses libcurl to connect to squid for caching (often for as
little as 10-30 seconds) and benefits of connection sharing.
There is still cases where data isn't being cached no matter what I do.
It is https data, but I am able to cache other https pages like
https://cdn.mbta.com/realtime/Alerts.pb
The refresh_pattern:
refresh_pattern . 60 99999% 7200 override-expire
override-lastmod reload-into-ims ignore-reload ignore-no-cache
ignore-no-store ignore-private ignore-auth store-stale
Please be aware that several of those options may be causing you more
problems than they solve:
* ignore-no-cache - no longer exists.
* ignore-reload - contradicts reload-into-ims and that can cause
inconsistent behaviour between initial MISS and followup HIT responses.
* override-lastmod - replaces all Last-Modification (L-M) headers with
values indicating the object is almost brand new. That prevents caches
detecting that objects have stuck around long enough not to need replacing.
Luckily the example response has no L-M header, so that is not the
problem here. But it may be affecting other traffic.
The http headers from curl of an example where it is not being cached:
* Trying 127.0.0.1:3128...
* Connected to 127.0.0.1 (127.0.0.1) port 3128 (#0)
> GET https://api.transport.nsw.gov.au/v1/gtfs/alerts/buses HTTP/1.1
Host: 127.0.0.1:3128
User-Agent: curl/7.77.0 (+https://www.transsee.ca/)
Accept: */*
Accept-Encoding: gzip
Note this header value is used for variant selection of the responses.
Is this test value the same as what the client apps send?
Authorization: apikey 2eYEqXXxOPEDChnpeF7sZL2aR8moD2DtdNmn
Cache-Control: max-age=60
Content-Encoding: aes128gcm
Apparently the GET message contains some content. Yet there is no
Content-Length or Transfer-Encoding to determine the length of that
content.
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sun, 20 Jun 2021 17:52:14 GMT
< Content-Type: application/protobuf
< Content-Length: 7455
...
< Access-Control-Allow-Credentials: true
< ETag: W/"ab70-8SI2GdBV4SJG4edSc4E5W8LBJWk"
< Vary: Accept-Encoding
< X-Cache: Hit from cloudfront
< X-Amz-Cf-Pop: SYD1-C1
< X-Amz-Cf-Id: hCoQckLsNONQMpgPr2kXJVdTDHu98jxl-rPXqV_PHB2vTCEomAd-Nw==
< Age: 35
< Access-Control-Allow-Origin: *
< Content-Encoding: gzip
< X-Cache: MISS from transsee
< X-Cache-Lookup: MISS from transsee:3128
< Via: 1.1 359a113ca166631b42f31a0f2e6a1aab.cloudfront.net (CloudFront),
1.1 transsee (squid/4.15)
< Connection: keep-alive
Here is a sample from the Squid access log:
1624212034.891 246 127.0.0.1 59216 TCP_MISS/200 8517 GET
https://api.transport.nsw.gov.au/v1/gtfs/alerts/buses -
HIER_DIRECT/52.65.222.24 application/protobuf
FYI, A single request is not sufficient to demonstrate caching issues.
Caches always require one MISS to fetch the data which then is expected
to show up as HIT on the second or later requests.
Couple of things going on here with timing make this a MISS:
* The log timestamp says your test happened at Mon, 21 Jun 2021 12:21:11 GMT
* The response Date says it was created Sun, 20 Jun 2021 17:52:14 GMT
* The request Cache-Control forbids receiving anything from cache older
than 60 seconds.
So, the response being many hours old cannot be delivered from a cache
to this test request.
NP: CloudFront are a reverse-proxy service, thus allowed to ignore
Cache-Control from clients. Which is why they respond with a HIT.
Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users