Jason Spegal wrote:
I am currently using the following for the items in question.
refresh_pattern pandora.com 0 300% 31536000
refresh_pattern . 0 80% 3156000
The dot (.) pattern matches every URL in existence.
For the pandora files you don't need to go 300%, but do need to add all
the available override-* and ignore-* violations available to the
"pandora.com" pattern.
I'd also try making the pandora pattern:
-i http://[^a-z\.]*pandora\.com/?
With violations off these work well. However they fail to cache all the
items I would like. When I had violations on I had tried refresh_pattern
. 0 0% 0 as well as setting all refresh_pattern to 0 0% 0 which still
failed to refresh the pages properly. I had also tried rebuilding the
cache from scratch several times.
Other relevant pattern's I am using:
#Dynamic Content
refresh_pattern -i cgi-bin 0 0% 0 refresh-ims
The following is a violation even if it works with violations not enabled.
refresh_pattern -i \? 0 0% 3156000 refresh-ims
refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0
refresh-ims
#HTML
refresh_pattern text/html 0 80% 2592000 refresh-ims
refresh_pattern text/css 0 80% 2592000 refresh-ims
#Java & Javascript
refresh_pattern -i .(js|jar|java) 0 100% 31536000
#By MIME-Type
refresh_pattern application/* 0 300% 31536000
refresh_pattern audio/* 0 300% 31536000
refresh_pattern images/* 0 300% 31536000
refresh_pattern text/* 0 300% 31536000
refresh_pattern video/* 0 300% 31536000
? mime patterns in the URL? with Squid?
Do you have a patch that doe this? If so please consider contributing
back to the project.
When I had violations on the Pandora entry was similar to this...
refresh_pattern pandora.com 0 300% 31536000 override-expire
reload-into-ims ignore-reload ignore-no-cache ignore-private
ignore-no-store ignore-auth
A single pattern like that should be all you need to add.
Some of the non-caching parameters are only able to be overridden in the
2.HEAD code though. You may need to grab a copy of the HEAD code and use
that.
PS. all of your file extension patterns above are using the very unsafe
.XX syntax. The pattern is a regex and matches anywhere in the URL. Its
likely catching a whole lot of URL which should not.
Please use: \.XX(\?.*)?$ instead. ie \.(js|jar|java)(\?.*)?$
Amos
Amos Jeffries wrote:
Jason Spegal wrote:
I would wager it's content control given what they are. However with
violations on they can be cached. Without they cannot. I just haven't
been able to figure out how to get squid to behave with violations
turned on. My only other option I can see is to setup a second squid
with violations and filter all the traffic to/from Pandora through it.
Use refresh_pattern with a regex that only matches pandora URL.
I'll wager you have either added all the overrides to the . pattern,
or have a overly-greedy regex in use.
Amos
Adrian Chadd wrote:
This doesn't surprise me. They may be trying to maximise outbound
bits, or try to retain control over content, or not understanding
caching, or all/combination of the above.
I'd suggest contacting them and asking.
adrian
2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>:
A little bit messy but here are some snippets.
###Access.log
1248572380.275 178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
- DIRECT/208.85.40.13 -
1248572409.144 8472 10.10.122.241 TCP_MISS/200 1581181 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
DIRECT/208.85.41.38 application/octet-stream
1248572439.512 94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
- NONE/- image/jpeg
1248572570.898 300 10.10.122.248 TCP_MISS/200 6521 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 image/jpeg
1248572600.538 29937 10.10.122.248 TCP_MISS/200 7704188 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
DIRECT/208.85.41.38 application/octet-stream
1248572615.735 11507 10.10.122.241 TCP_MISS/200 2109481 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
DIRECT/208.85.41.36 application/octet-stream
1248572635.903 179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 -
1248572641.444 40 10.10.122.241 TCP_HIT/200 21616 GET
http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
- NONE/- image/jpeg
###Store.log
1248572380.275 RELEASE -1 FFFFFFFF
097EAE1108DCEF192ED1C3BFF1F6C1B5 304
1248572380 -1 -1 unknown -1/0 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
1248572409.144 RELEASE -1 FFFFFFFF
6B93B1BF958703B3FC3CD1ADDD515695 200
1248572400 -1 1248572400 application/octet-stream
1580815/1580815 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
1248572570.897 SWAPOUT 00 0004CF23
BEEE111A39B596B14903743011AF2C36 200
1248572570 1248490006 -1 image/jpeg 6181/6181 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
1248572600.538 RELEASE -1 FFFFFFFF
070416ED935AD18DCA793569D2C6A652 200
1248572570 -1 1248572570 application/octet-stream
7703822/7703822 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
1248572615.735 RELEASE -1 FFFFFFFF
B0EB42B39131DF028BA3BE9A39CC24E4 200
1248572604 -1 1248572604 application/octet-stream
2109115/2109115 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
1248572635.903 RELEASE -1 FFFFFFFF
CDCA0D3510080D121E5578310976676E 304
1248572635 -1 -1 unknown -1/0 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
1248572886.822 RELEASE -1 FFFFFFFF
A95C86074129546301911C2FC251071D 200
1248572872 -1 1248572872 application/octet-stream
2086824/2086824 GET
http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?
###Wireshark
Hypertext Transfer Protocol
HTTP/1.0 200 OK\r\n
Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
Server: Apache\r\n
Content-Length: 6137729\r\n
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
Pragma: no-cache, no-store\r\n
Expires: -1\r\n
Content-Type: application/octet-stream\r\n
X-Cache: MISS from ichiban\r\n
X-Cache-Lookup: MISS from ichiban:3128\r\n
Via: 1.0 ichiban (squid)\r\n
Proxy-Connection: keep-alive\r\n
\r\n
mos Jeffries wrote:
Jason Spegal wrote:
I was able to cache Pandora by compiling with
--enable-http-violations
and using a refresh_pattern to cache everything regardless. This
however
broke everything by preventing proper refreshing of any site. If
it could be
worked where violations only happened as directly specified in the
configuration it would be a workable solution. I did some testing
and I
could not confirm that it was anything in the configuration file
itself that
was causing the issue. I wouldn't recommend using this as such.
Which indicates that there are fine tuning possible to cache just
Pandora.
Find yoursef one of the Pandora URLs in your access.log and take a
visit to
www.redbot.org or the ircache.org cacheability engine.
Amos
Henrik Nordstrom wrote:
lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:
One of the largest consumers of our HTTP bandwidth is Pandora,
the free
music service. Unfortunately, Pandora marks its streams as
non-cacheable and
also puts question marks in the URLs, which is a huge waste of
bandwidth.
How can this be overridden?
The questionmark can be ignored. See the "cache" directive. But
if there
is other parameters behind there (normally not logged) that just
may not
help..
Regarding non-cacheable.. most crap can be overridden by
refresh_pattern.
But, if it's a streaming service (I know nothing about Pandora)
then you
are quite likely out of luck.
Regards
Henrik
--
Please be using
Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
Current Beta Squid 3.1.0.10 or 3.1.0.11