Search squid archive

Re: Caching Pandora

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason Spegal wrote:
I am currently using the following for the items in question.

refresh_pattern pandora.com 0 300% 31536000
refresh_pattern .               0       80%    3156000

The dot (.) pattern matches every URL in existence.

For the pandora files you don't need to go 300%, but do need to add all the available override-* and ignore-* violations available to the "pandora.com" pattern.

I'd also try making the pandora pattern:
  -i http://[^a-z\.]*pandora\.com/?



With violations off these work well. However they fail to cache all the items I would like. When I had violations on I had tried refresh_pattern . 0 0% 0 as well as setting all refresh_pattern to 0 0% 0 which still failed to refresh the pages properly. I had also tried rebuilding the cache from scratch several times.

Other relevant pattern's I am using:

#Dynamic Content
refresh_pattern -i cgi-bin 0 0% 0 refresh-ims

The following is a violation even if it works with violations not enabled.
refresh_pattern -i \? 0 0% 3156000 refresh-ims
refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0 refresh-ims

#HTML
refresh_pattern text/html 0 80% 2592000 refresh-ims
refresh_pattern text/css 0 80% 2592000 refresh-ims

#Java & Javascript
refresh_pattern -i .(js|jar|java) 0 100% 31536000

#By MIME-Type
refresh_pattern application/* 0 300% 31536000
refresh_pattern audio/* 0 300% 31536000
refresh_pattern images/* 0 300% 31536000
refresh_pattern text/* 0 300% 31536000
refresh_pattern video/* 0 300% 31536000


? mime patterns in the URL? with Squid?

Do you have a patch that doe this? If so please consider contributing back to the project.


When I had violations on the Pandora entry was similar to this...

refresh_pattern pandora.com 0 300% 31536000 override-expire reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-no-store ignore-auth

A single pattern like that should be all you need to add.

Some of the non-caching parameters are only able to be overridden in the 2.HEAD code though. You may need to grab a copy of the HEAD code and use that.


PS. all of your file extension patterns above are using the very unsafe .XX syntax. The pattern is a regex and matches anywhere in the URL. Its likely catching a whole lot of URL which should not.

 Please use:   \.XX(\?.*)?$   instead.  ie \.(js|jar|java)(\?.*)?$

Amos


Amos Jeffries wrote:
Jason Spegal wrote:
I would wager it's content control given what they are. However with violations on they can be cached. Without they cannot. I just haven't been able to figure out how to get squid to behave with violations turned on. My only other option I can see is to setup a second squid with violations and filter all the traffic to/from Pandora through it.

Use refresh_pattern with a regex that only matches pandora URL.

I'll wager you have either added all the overrides to the . pattern, or have a overly-greedy regex in use.

Amos


Adrian Chadd wrote:
This doesn't surprise me. They may be trying to maximise outbound
bits, or try to retain control over content, or not understanding
caching, or all/combination of the above.

I'd suggest contacting them and asking.




adrian

2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>:
A little bit messy but here are some snippets.

###Access.log

1248572380.275    178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
- DIRECT/208.85.40.13 -
1248572409.144   8472 10.10.122.241 TCP_MISS/200 1581181 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
DIRECT/208.85.41.38 application/octet-stream
1248572439.512     94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
- NONE/- image/jpeg
1248572570.898    300 10.10.122.248 TCP_MISS/200 6521 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 image/jpeg
1248572600.538  29937 10.10.122.248 TCP_MISS/200 7704188 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
DIRECT/208.85.41.38 application/octet-stream
1248572615.735  11507 10.10.122.241 TCP_MISS/200 2109481 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
DIRECT/208.85.41.36 application/octet-stream
1248572635.903    179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 -
1248572641.444     40 10.10.122.241 TCP_HIT/200 21616 GET
http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
- NONE/- image/jpeg

###Store.log

1248572380.275 RELEASE -1 FFFFFFFF 097EAE1108DCEF192ED1C3BFF1F6C1B5 304
1248572380        -1        -1 unknown -1/0 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg 1248572409.144 RELEASE -1 FFFFFFFF 6B93B1BF958703B3FC3CD1ADDD515695 200 1248572400 -1 1248572400 application/octet-stream 1580815/1580815 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
1248572570.897 SWAPOUT 00 0004CF23 BEEE111A39B596B14903743011AF2C36 200
1248572570 1248490006        -1 image/jpeg 6181/6181 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg 1248572600.538 RELEASE -1 FFFFFFFF 070416ED935AD18DCA793569D2C6A652 200 1248572570 -1 1248572570 application/octet-stream 7703822/7703822 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
1248572615.735 RELEASE -1 FFFFFFFF B0EB42B39131DF028BA3BE9A39CC24E4 200 1248572604 -1 1248572604 application/octet-stream 2109115/2109115 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
1248572635.903 RELEASE -1 FFFFFFFF CDCA0D3510080D121E5578310976676E 304
1248572635        -1        -1 unknown -1/0 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg 1248572886.822 RELEASE -1 FFFFFFFF A95C86074129546301911C2FC251071D 200 1248572872 -1 1248572872 application/octet-stream 2086824/2086824 GET
http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?

###Wireshark

Hypertext Transfer Protocol
HTTP/1.0 200 OK\r\n
Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
Server: Apache\r\n
Content-Length: 6137729\r\n
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
Pragma: no-cache, no-store\r\n
Expires: -1\r\n
Content-Type: application/octet-stream\r\n
X-Cache: MISS from ichiban\r\n
X-Cache-Lookup: MISS from ichiban:3128\r\n
Via: 1.0 ichiban (squid)\r\n
Proxy-Connection: keep-alive\r\n
\r\n

mos Jeffries wrote:
Jason Spegal wrote:
I was able to cache Pandora by compiling with --enable-http-violations and using a refresh_pattern to cache everything regardless. This however broke everything by preventing proper refreshing of any site. If it could be
worked where violations only happened as directly specified in the
configuration it would be a workable solution. I did some testing and I could not confirm that it was anything in the configuration file itself that
was causing the issue. I wouldn't recommend using this as such.

Which indicates that there are fine tuning possible to cache just Pandora. Find yoursef one of the Pandora URLs in your access.log and take a visit to
www.redbot.org or the ircache.org cacheability engine.


Amos



Henrik Nordstrom wrote:
lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:

One of the largest consumers of our HTTP bandwidth is Pandora, the free music service. Unfortunately, Pandora marks its streams as non-cacheable and also puts question marks in the URLs, which is a huge waste of bandwidth.
How can this be overridden?

The questionmark can be ignored. See the "cache" directive. But if there is other parameters behind there (normally not logged) that just may not
help..

Regarding non-cacheable.. most crap can be overridden by
refresh_pattern.

But, if it's a streaming service (I know nothing about Pandora) then you
are quite likely out of luck.

Regards
Henrik








--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
  Current Beta Squid 3.1.0.10 or 3.1.0.11

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux