Search squid archive

Re: Caching Pandora

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Amos Jeffries wrote:
Jason Spegal wrote:
I am currently using the following for the items in question.

refresh_pattern pandora.com 0 300% 31536000
refresh_pattern .               0       80%    3156000

The dot (.) pattern matches every URL in existence.

For the pandora files you don't need to go 300%, but do need to add all the available override-* and ignore-* violations available to the "pandora.com" pattern.

I'd also try making the pandora pattern:
  -i http://[^a-z\.]*pandora\.com/?

Ok, the changes were made so the new line is
refresh_pattern -i http://[^a-z\.]*pandora\.com/? 0 300% 31536000 override-expire reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-no-store ignore-auth

The following are the results from store.log after the change. It appears that they are still failing to cache.

1248619647.717 RELEASE -1 FFFFFFFF 2DDD8D498CF4C28F60520AA26761A1F6 200 1248619640 -1 1248619640 application/octet-stream 1627255/1627255 GET http://audio-sjl-t2-1.pandora.com/access/7886817187448819808.mp4? 1248619657.439 RELEASE -1 FFFFFFFF 21387EECAF5FFCF61AEE68B2494F7A01 200 1248619621 -1 1248619621 application/octet-stream 6327065/6327065 GET http://audio-sjl-t2-2.pandora.com/access/8544252120326380207.mp3? 1248619860.906 RELEASE -1 FFFFFFFF B838385F620C52ECE3B4F4E3BBC21270 200 1248619847 -1 1248619847 application/octet-stream 2462059/2462059 GET http://audio-sjl-t1-2.pandora.com/access/3264482519687036142.mp4? 1248619895.636 RELEASE -1 FFFFFFFF 86A59F24244895283DC5BE8124F7C248 200 1248619878 -1 1248619878 application/octet-stream 4585429/4585429 GET http://audio-sjl-t3-2.pandora.com/access/7586905698959626071.mp3?



With violations off these work well. However they fail to cache all the items I would like. When I had violations on I had tried refresh_pattern . 0 0% 0 as well as setting all refresh_pattern to 0 0% 0 which still failed to refresh the pages properly. I had also tried rebuilding the cache from scratch several times.

Other relevant pattern's I am using:

#Dynamic Content
refresh_pattern -i cgi-bin 0 0% 0 refresh-ims

The following is a violation even if it works with violations not enabled.
refresh_pattern -i \? 0 0% 3156000 refresh-ims
refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0 refresh-ims

#HTML
refresh_pattern text/html 0 80% 2592000 refresh-ims
refresh_pattern text/css 0 80% 2592000 refresh-ims

#Java & Javascript
refresh_pattern -i .(js|jar|java) 0 100% 31536000

#By MIME-Type
refresh_pattern application/* 0 300% 31536000
refresh_pattern audio/* 0 300% 31536000
refresh_pattern images/* 0 300% 31536000
refresh_pattern text/* 0 300% 31536000
refresh_pattern video/* 0 300% 31536000


? mime patterns in the URL? with Squid?

Do you have a patch that doe this? If so please consider contributing back to the project.

I take it your referring to refresh_pattern -i \? 0 0% 3156000 refresh-ims. I was under the impression that squid supports this. I am using Squeezzer2 to check how well the patterns work. It does seem to work.

Also the version of squid I am using is 3.0.16 with the following patches
squid-3.0.16-adapted-zph.patch
squid-3.0.16-cross-compile.patch
squid-3.0.16-gentoo.patch

It is complied through Gentoo's Emerge.

When I had violations on the Pandora entry was similar to this...

refresh_pattern pandora.com 0 300% 31536000 override-expire reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-no-store ignore-auth

A single pattern like that should be all you need to add.

Some of the non-caching parameters are only able to be overridden in the 2.HEAD code though. You may need to grab a copy of the HEAD code and use that.


PS. all of your file extension patterns above are using the very unsafe .XX syntax. The pattern is a regex and matches anywhere in the URL. Its likely catching a whole lot of URL which should not.

 Please use:   \.XX(\?.*)?$   instead.  ie \.(js|jar|java)(\?.*)?$

I'm not sure I understand this example. Can you give a literal example please? From what I'm understanding your saying refresh_pattern -i .jpg 0 300% 31536000 would be bad because http://www.jpgas.com would be cached with that pattern which may, for sake of example, have settings that would break that site. Your recommending refresh_pattern -i \.jpg 0 300% 31536000 and not doing something like refresh_pattern -i .(jpg|gif|png|ico|tga) 0 300% 31536000 ?

As far as messing with the code goes I haven't been into doing that as of yet. For my purposes my goal is to build a server/router/etc that will turn crappy internet into good internet while being able to service a number of people. This runs my home network normally these days and was originally conceived and built to support 1000 users for a dormitory through a single cable modem. Tweaking it for maximium efficiency is a hobby for me now. My coding skills are fairly weak and I wouldn't know where to start for a lot of this. I am willing to help test things out and such however.

Amos


Amos Jeffries wrote:
Jason Spegal wrote:
I would wager it's content control given what they are. However with violations on they can be cached. Without they cannot. I just haven't been able to figure out how to get squid to behave with violations turned on. My only other option I can see is to setup a second squid with violations and filter all the traffic to/from Pandora through it.

Use refresh_pattern with a regex that only matches pandora URL.

I'll wager you have either added all the overrides to the . pattern, or have a overly-greedy regex in use.

Amos


Adrian Chadd wrote:
This doesn't surprise me. They may be trying to maximise outbound
bits, or try to retain control over content, or not understanding
caching, or all/combination of the above.

I'd suggest contacting them and asking.




adrian

2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>:
A little bit messy but here are some snippets.

###Access.log

1248572380.275 178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
- DIRECT/208.85.40.13 -
1248572409.144   8472 10.10.122.241 TCP_MISS/200 1581181 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
DIRECT/208.85.41.38 application/octet-stream
1248572439.512     94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
- NONE/- image/jpeg
1248572570.898    300 10.10.122.248 TCP_MISS/200 6521 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 image/jpeg
1248572600.538  29937 10.10.122.248 TCP_MISS/200 7704188 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
DIRECT/208.85.41.38 application/octet-stream
1248572615.735  11507 10.10.122.241 TCP_MISS/200 2109481 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
DIRECT/208.85.41.36 application/octet-stream
1248572635.903 179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
- DIRECT/208.85.41.23 -
1248572641.444     40 10.10.122.241 TCP_HIT/200 21616 GET
http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
- NONE/- image/jpeg

###Store.log

1248572380.275 RELEASE -1 FFFFFFFF 097EAE1108DCEF192ED1C3BFF1F6C1B5 304
1248572380        -1        -1 unknown -1/0 GET
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg 1248572409.144 RELEASE -1 FFFFFFFF 6B93B1BF958703B3FC3CD1ADDD515695 200 1248572400 -1 1248572400 application/octet-stream 1580815/1580815 GET
http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
1248572570.897 SWAPOUT 00 0004CF23 BEEE111A39B596B14903743011AF2C36 200
1248572570 1248490006        -1 image/jpeg 6181/6181 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg 1248572600.538 RELEASE -1 FFFFFFFF 070416ED935AD18DCA793569D2C6A652 200 1248572570 -1 1248572570 application/octet-stream 7703822/7703822 GET
http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
1248572615.735 RELEASE -1 FFFFFFFF B0EB42B39131DF028BA3BE9A39CC24E4 200 1248572604 -1 1248572604 application/octet-stream 2109115/2109115 GET
http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
1248572635.903 RELEASE -1 FFFFFFFF CDCA0D3510080D121E5578310976676E 304
1248572635        -1        -1 unknown -1/0 GET
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg 1248572886.822 RELEASE -1 FFFFFFFF A95C86074129546301911C2FC251071D 200 1248572872 -1 1248572872 application/octet-stream 2086824/2086824 GET
http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?

###Wireshark

Hypertext Transfer Protocol
HTTP/1.0 200 OK\r\n
Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
Server: Apache\r\n
Content-Length: 6137729\r\n
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
Pragma: no-cache, no-store\r\n
Expires: -1\r\n
Content-Type: application/octet-stream\r\n
X-Cache: MISS from ichiban\r\n
X-Cache-Lookup: MISS from ichiban:3128\r\n
Via: 1.0 ichiban (squid)\r\n
Proxy-Connection: keep-alive\r\n
\r\n

mos Jeffries wrote:
Jason Spegal wrote:
I was able to cache Pandora by compiling with --enable-http-violations and using a refresh_pattern to cache everything regardless. This however broke everything by preventing proper refreshing of any site. If it could be
worked where violations only happened as directly specified in the
configuration it would be a workable solution. I did some testing and I could not confirm that it was anything in the configuration file itself that
was causing the issue. I wouldn't recommend using this as such.

Which indicates that there are fine tuning possible to cache just Pandora. Find yoursef one of the Pandora URLs in your access.log and take a visit to
www.redbot.org or the ircache.org cacheability engine.


Amos



Henrik Nordstrom wrote:
lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:

One of the largest consumers of our HTTP bandwidth is Pandora, the free music service. Unfortunately, Pandora marks its streams as non-cacheable and also puts question marks in the URLs, which is a huge waste of bandwidth.
How can this be overridden?

The questionmark can be ignored. See the "cache" directive. But if there is other parameters behind there (normally not logged) that just may not
help..

Regarding non-cacheable.. most crap can be overridden by
refresh_pattern.

But, if it's a streaming service (I know nothing about Pandora) then you
are quite likely out of luck.

Regards
Henrik










[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux