Amos Jeffries wrote:
Jason Spegal wrote:
I am currently using the following for the items in question.
refresh_pattern pandora.com 0 300% 31536000
refresh_pattern . 0 80% 3156000
The dot (.) pattern matches every URL in existence.
For the pandora files you don't need to go 300%, but do need to add
all the available override-* and ignore-* violations available to the
"pandora.com" pattern.
I'd also try making the pandora pattern:
-i http://[^a-z\.]*pandora\.com/?
Ok, the changes were made so the new line is
refresh_pattern -i http://[^a-z\.]*pandora\.com/? 0 300% 31536000
override-expire reload-into-ims ignore-reload ignore-no-cache
ignore-private ignore-no-store ignore-auth
The following are the results from store.log after the change. It
appears that they are still failing to cache.
1248619647.717 RELEASE -1 FFFFFFFF 2DDD8D498CF4C28F60520AA26761A1F6 200
1248619640 -1 1248619640 application/octet-stream 1627255/1627255
GET http://audio-sjl-t2-1.pandora.com/access/7886817187448819808.mp4?
1248619657.439 RELEASE -1 FFFFFFFF 21387EECAF5FFCF61AEE68B2494F7A01 200
1248619621 -1 1248619621 application/octet-stream 6327065/6327065
GET http://audio-sjl-t2-2.pandora.com/access/8544252120326380207.mp3?
1248619860.906 RELEASE -1 FFFFFFFF B838385F620C52ECE3B4F4E3BBC21270 200
1248619847 -1 1248619847 application/octet-stream 2462059/2462059
GET http://audio-sjl-t1-2.pandora.com/access/3264482519687036142.mp4?
1248619895.636 RELEASE -1 FFFFFFFF 86A59F24244895283DC5BE8124F7C248 200
1248619878 -1 1248619878 application/octet-stream 4585429/4585429
GET http://audio-sjl-t3-2.pandora.com/access/7586905698959626071.mp3?
With violations off these work well. However they fail to cache all
the items I would like. When I had violations on I had tried
refresh_pattern . 0 0% 0 as well as setting all refresh_pattern to 0
0% 0 which still failed to refresh the pages properly. I had also
tried rebuilding the cache from scratch several times.
Other relevant pattern's I am using:
#Dynamic Content
refresh_pattern -i cgi-bin 0 0% 0 refresh-ims
The following is a violation even if it works with violations not
enabled.
refresh_pattern -i \? 0 0% 3156000 refresh-ims
refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0
refresh-ims
#HTML
refresh_pattern text/html 0 80% 2592000 refresh-ims
refresh_pattern text/css 0 80% 2592000 refresh-ims
#Java & Javascript
refresh_pattern -i .(js|jar|java) 0 100% 31536000
#By MIME-Type
refresh_pattern application/* 0 300% 31536000
refresh_pattern audio/* 0 300% 31536000
refresh_pattern images/* 0 300% 31536000
refresh_pattern text/* 0 300% 31536000
refresh_pattern video/* 0 300% 31536000
? mime patterns in the URL? with Squid?
Do you have a patch that doe this? If so please consider contributing
back to the project.
I take it your referring to refresh_pattern -i \? 0 0% 3156000
refresh-ims. I was under the impression that squid supports this. I am
using Squeezzer2 to check how well the patterns work. It does seem to work.
Also the version of squid I am using is 3.0.16 with the following patches
squid-3.0.16-adapted-zph.patch
squid-3.0.16-cross-compile.patch
squid-3.0.16-gentoo.patch
It is complied through Gentoo's Emerge.
When I had violations on the Pandora entry was similar to this...
refresh_pattern pandora.com 0 300% 31536000 override-expire
reload-into-ims ignore-reload ignore-no-cache ignore-private
ignore-no-store ignore-auth
A single pattern like that should be all you need to add.
Some of the non-caching parameters are only able to be overridden in
the 2.HEAD code though. You may need to grab a copy of the HEAD code
and use that.
PS. all of your file extension patterns above are using the very
unsafe .XX syntax. The pattern is a regex and matches anywhere in the
URL. Its likely catching a whole lot of URL which should not.
Please use: \.XX(\?.*)?$ instead. ie \.(js|jar|java)(\?.*)?$
I'm not sure I understand this example. Can you give a literal example
please? From what I'm understanding your saying refresh_pattern -i .jpg
0 300% 31536000 would be bad because http://www.jpgas.com would be
cached with that pattern which may, for sake of example, have settings
that would break that site. Your recommending refresh_pattern -i \.jpg 0
300% 31536000 and not doing something like refresh_pattern -i
.(jpg|gif|png|ico|tga) 0 300% 31536000 ?
As far as messing with the code goes I haven't been into doing that as
of yet. For my purposes my goal is to build a server/router/etc that
will turn crappy internet into good internet while being able to service
a number of people. This runs my home network normally these days and
was originally conceived and built to support 1000 users for a dormitory
through a single cable modem. Tweaking it for maximium efficiency is a
hobby for me now. My coding skills are fairly weak and I wouldn't know
where to start for a lot of this. I am willing to help test things out
and such however.
Amos
Amos Jeffries wrote:
Jason Spegal wrote:
I would wager it's content control given what they are. However
with violations on they can be cached. Without they cannot. I just
haven't been able to figure out how to get squid to behave with
violations turned on. My only other option I can see is to setup a
second squid with violations and filter all the traffic to/from
Pandora through it.
Use refresh_pattern with a regex that only matches pandora URL.
I'll wager you have either added all the overrides to the . pattern,
or have a overly-greedy regex in use.
Amos
Adrian Chadd wrote:
This doesn't surprise me. They may be trying to maximise outbound
bits, or try to retain control over content, or not understanding
caching, or all/combination of the above.
I'd suggest contacting them and asking.
adrian
2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>:
A little bit messy but here are some snippets.
###Access.log
1248572380.275 178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304
232 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
- DIRECT/208.85.40.13 -
1248572409.144 8472 10.10.122.241 TCP_MISS/200 1581181 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
DIRECT/208.85.41.38 application/octet-stream
1248572439.512 94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
- NONE/- image/jpeg
1248572570.898 300 10.10.122.248 TCP_MISS/200 6521 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 image/jpeg
1248572600.538 29937 10.10.122.248 TCP_MISS/200 7704188 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
DIRECT/208.85.41.38 application/octet-stream
1248572615.735 11507 10.10.122.241 TCP_MISS/200 2109481 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
DIRECT/208.85.41.36 application/octet-stream
1248572635.903 179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304
232 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 -
1248572641.444 40 10.10.122.241 TCP_HIT/200 21616 GET
http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
- NONE/- image/jpeg
###Store.log
1248572380.275 RELEASE -1 FFFFFFFF
097EAE1108DCEF192ED1C3BFF1F6C1B5 304
1248572380 -1 -1 unknown -1/0 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
1248572409.144 RELEASE -1 FFFFFFFF
6B93B1BF958703B3FC3CD1ADDD515695 200
1248572400 -1 1248572400 application/octet-stream
1580815/1580815 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
1248572570.897 SWAPOUT 00 0004CF23
BEEE111A39B596B14903743011AF2C36 200
1248572570 1248490006 -1 image/jpeg 6181/6181 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
1248572600.538 RELEASE -1 FFFFFFFF
070416ED935AD18DCA793569D2C6A652 200
1248572570 -1 1248572570 application/octet-stream
7703822/7703822 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
1248572615.735 RELEASE -1 FFFFFFFF
B0EB42B39131DF028BA3BE9A39CC24E4 200
1248572604 -1 1248572604 application/octet-stream
2109115/2109115 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
1248572635.903 RELEASE -1 FFFFFFFF
CDCA0D3510080D121E5578310976676E 304
1248572635 -1 -1 unknown -1/0 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
1248572886.822 RELEASE -1 FFFFFFFF
A95C86074129546301911C2FC251071D 200
1248572872 -1 1248572872 application/octet-stream
2086824/2086824 GET
http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?
###Wireshark
Hypertext Transfer Protocol
HTTP/1.0 200 OK\r\n
Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
Server: Apache\r\n
Content-Length: 6137729\r\n
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
Pragma: no-cache, no-store\r\n
Expires: -1\r\n
Content-Type: application/octet-stream\r\n
X-Cache: MISS from ichiban\r\n
X-Cache-Lookup: MISS from ichiban:3128\r\n
Via: 1.0 ichiban (squid)\r\n
Proxy-Connection: keep-alive\r\n
\r\n
mos Jeffries wrote:
Jason Spegal wrote:
I was able to cache Pandora by compiling with
--enable-http-violations
and using a refresh_pattern to cache everything regardless.
This however
broke everything by preventing proper refreshing of any site.
If it could be
worked where violations only happened as directly specified in the
configuration it would be a workable solution. I did some
testing and I
could not confirm that it was anything in the configuration
file itself that
was causing the issue. I wouldn't recommend using this as such.
Which indicates that there are fine tuning possible to cache
just Pandora.
Find yoursef one of the Pandora URLs in your access.log and take
a visit to
www.redbot.org or the ircache.org cacheability engine.
Amos
Henrik Nordstrom wrote:
lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:
One of the largest consumers of our HTTP bandwidth is
Pandora, the free
music service. Unfortunately, Pandora marks its streams as
non-cacheable and
also puts question marks in the URLs, which is a huge waste
of bandwidth.
How can this be overridden?
The questionmark can be ignored. See the "cache" directive.
But if there
is other parameters behind there (normally not logged) that
just may not
help..
Regarding non-cacheable.. most crap can be overridden by
refresh_pattern.
But, if it's a streaming service (I know nothing about
Pandora) then you
are quite likely out of luck.
Regards
Henrik