On Sun, 26 Jul 2009 11:03:00 -0400, Jason Spegal <jspegal@xxxxxxxxxxx> wrote: > Amos Jeffries wrote: >> Jason Spegal wrote: >>> I am currently using the following for the items in question. >>> >>> refresh_pattern pandora.com 0 300% 31536000 >>> refresh_pattern . 0 80% 3156000 >> >> The dot (.) pattern matches every URL in existence. >> >> For the pandora files you don't need to go 300%, but do need to add >> all the available override-* and ignore-* violations available to the >> "pandora.com" pattern. >> >> I'd also try making the pandora pattern: >> -i http://[^a-z\.]*pandora\.com/? >> > Ok, the changes were made so the new line is > refresh_pattern -i http://[^a-z\.]*pandora\.com/? 0 300% 31536000 > override-expire reload-into-ims ignore-reload ignore-no-cache > ignore-private ignore-no-store ignore-auth > > The following are the results from store.log after the change. It > appears that they are still failing to cache. > > 1248619647.717 RELEASE -1 FFFFFFFF 2DDD8D498CF4C28F60520AA26761A1F6 200 > 1248619640 -1 1248619640 application/octet-stream 1627255/1627255 > GET http://audio-sjl-t2-1.pandora.com/access/7886817187448819808.mp4? > 1248619657.439 RELEASE -1 FFFFFFFF 21387EECAF5FFCF61AEE68B2494F7A01 200 > 1248619621 -1 1248619621 application/octet-stream 6327065/6327065 > GET http://audio-sjl-t2-2.pandora.com/access/8544252120326380207.mp3? > 1248619860.906 RELEASE -1 FFFFFFFF B838385F620C52ECE3B4F4E3BBC21270 200 > 1248619847 -1 1248619847 application/octet-stream 2462059/2462059 > GET http://audio-sjl-t1-2.pandora.com/access/3264482519687036142.mp4? > 1248619895.636 RELEASE -1 FFFFFFFF 86A59F24244895283DC5BE8124F7C248 200 > 1248619878 -1 1248619878 application/octet-stream 4585429/4585429 > GET http://audio-sjl-t3-2.pandora.com/access/7586905698959626071.mp3? > > >>> >>> With violations off these work well. However they fail to cache all >>> the items I would like. When I had violations on I had tried >>> refresh_pattern . 0 0% 0 as well as setting all refresh_pattern to 0 >>> 0% 0 which still failed to refresh the pages properly. I had also >>> tried rebuilding the cache from scratch several times. >>> >>> Other relevant pattern's I am using: >>> >>> #Dynamic Content >>> refresh_pattern -i cgi-bin 0 0% 0 refresh-ims >> >> The following is a violation even if it works with violations not >> enabled. >>> refresh_pattern -i \? 0 0% 3156000 refresh-ims >>> refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0 >>> refresh-ims >> >>> #HTML >>> refresh_pattern text/html 0 80% 2592000 refresh-ims >>> refresh_pattern text/css 0 80% 2592000 refresh-ims >>> >>> #Java & Javascript >>> refresh_pattern -i .(js|jar|java) 0 100% 31536000 >>> >>> #By MIME-Type >>> refresh_pattern application/* 0 300% 31536000 >>> refresh_pattern audio/* 0 300% 31536000 >>> refresh_pattern images/* 0 300% 31536000 >>> refresh_pattern text/* 0 300% 31536000 >>> refresh_pattern video/* 0 300% 31536000 >>> >> >> ? mime patterns in the URL? with Squid? >> >> Do you have a patch that doe this? If so please consider contributing >> back to the project. >> > I take it your referring to refresh_pattern -i \? 0 0% 3156000 > refresh-ims. I was under the impression that squid supports this. I am > using Squeezzer2 to check how well the patterns work. It does seem to work. > > Also the version of squid I am using is 3.0.16 with the following patches > squid-3.0.16-adapted-zph.patch > squid-3.0.16-cross-compile.patch > squid-3.0.16-gentoo.patch > > It is complied through Gentoo's Emerge. >>> >>> When I had violations on the Pandora entry was similar to this... >>> >>> refresh_pattern pandora.com 0 300% 31536000 override-expire >>> reload-into-ims ignore-reload ignore-no-cache ignore-private >>> ignore-no-store ignore-auth >> >> A single pattern like that should be all you need to add. >> >> Some of the non-caching parameters are only able to be overridden in >> the 2.HEAD code though. You may need to grab a copy of the HEAD code >> and use that. >> >> >> PS. all of your file extension patterns above are using the very >> unsafe .XX syntax. The pattern is a regex and matches anywhere in the >> URL. Its likely catching a whole lot of URL which should not. >> >> Please use: \.XX(\?.*)?$ instead. ie \.(js|jar|java)(\?.*)?$ >> > I'm not sure I understand this example. Can you give a literal example Sorry, it wasn't very clear: refresh_pattern -i \.(js|jar|java)(\?.*)?$ 0 100% 31536000 > please? From what I'm understanding your saying refresh_pattern -i .jpg > 0 300% 31536000 would be bad because http://www.jpgas.com would be > cached with that pattern which may, for sake of example, have settings > that would break that site. Your recommending refresh_pattern -i \.jpg 0 > 300% 31536000 and not doing something like refresh_pattern -i > .(jpg|gif|png|ico|tga) 0 300% 31536000 ? Exactly. The $ and (\?.*) bits make sure that the match is at the end of the filename regardless of query string. > > As far as messing with the code goes I haven't been into doing that as > of yet. For my purposes my goal is to build a server/router/etc that > will turn crappy internet into good internet while being able to service > a number of people. This runs my home network normally these days and > was originally conceived and built to support 1000 users for a dormitory > through a single cable modem. Tweaking it for maximium efficiency is a > hobby for me now. My coding skills are fairly weak and I wouldn't know > where to start for a lot of this. I am willing to help test things out > and such however. You should not have to do more than a source build at worst to get this going. Mark Nottingham from yahoo! and others have done a lot of work towards making Squid configurable for this kind of thing. the down-side is that the results of all that work are only available in 2.HEAD waiting for port to Squid-3 (most of the refresh pattern stuff is) or a 2.8 release to happen. For now the future 2.8 code is at: http://www.squid-cache.org/Versions/v2/HEAD/ Amos > >> Amos >> >> >>> Amos Jeffries wrote: >>>> Jason Spegal wrote: >>>>> I would wager it's content control given what they are. However >>>>> with violations on they can be cached. Without they cannot. I just >>>>> haven't been able to figure out how to get squid to behave with >>>>> violations turned on. My only other option I can see is to setup a >>>>> second squid with violations and filter all the traffic to/from >>>>> Pandora through it. >>>> >>>> Use refresh_pattern with a regex that only matches pandora URL. >>>> >>>> I'll wager you have either added all the overrides to the . pattern, >>>> or have a overly-greedy regex in use. >>>> >>>> Amos >>>> >>>>> >>>>> Adrian Chadd wrote: >>>>>> This doesn't surprise me. They may be trying to maximise outbound >>>>>> bits, or try to retain control over content, or not understanding >>>>>> caching, or all/combination of the above. >>>>>> >>>>>> I'd suggest contacting them and asking. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> adrian >>>>>> >>>>>> 2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>: >>>>>> >>>>>>> A little bit messy but here are some snippets. >>>>>>> >>>>>>> ###Access.log >>>>>>> >>>>>>> 1248572380.275 178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 >>>>>>> 232 GET >>>>>>> http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg >>>>>>> >>>>>>> >>>>>>> - DIRECT/208.85.40.13 - >>>>>>> 1248572409.144 8472 10.10.122.241 TCP_MISS/200 1581181 GET >>>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? - >>>>>>> DIRECT/208.85.41.38 application/octet-stream >>>>>>> 1248572439.512 94 10.10.122.241 TCP_MEM_HIT/200 55396 GET >>>>>>> http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg >>>>>>> >>>>>>> >>>>>>> - NONE/- image/jpeg >>>>>>> 1248572570.898 300 10.10.122.248 TCP_MISS/200 6521 GET >>>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg >>>>>>> >>>>>>> >>>>>>> - DIRECT/208.85.41.23 image/jpeg >>>>>>> 1248572600.538 29937 10.10.122.248 TCP_MISS/200 7704188 GET >>>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? - >>>>>>> DIRECT/208.85.41.38 application/octet-stream >>>>>>> 1248572615.735 11507 10.10.122.241 TCP_MISS/200 2109481 GET >>>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? - >>>>>>> DIRECT/208.85.41.36 application/octet-stream >>>>>>> 1248572635.903 179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 >>>>>>> 232 GET >>>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg >>>>>>> >>>>>>> >>>>>>> - DIRECT/208.85.41.23 - >>>>>>> 1248572641.444 40 10.10.122.241 TCP_HIT/200 21616 GET >>>>>>> http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg >>>>>>> >>>>>>> >>>>>>> - NONE/- image/jpeg >>>>>>> >>>>>>> ###Store.log >>>>>>> >>>>>>> 1248572380.275 RELEASE -1 FFFFFFFF >>>>>>> 097EAE1108DCEF192ED1C3BFF1F6C1B5 304 >>>>>>> 1248572380 -1 -1 unknown -1/0 GET >>>>>>> http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg >>>>>>> >>>>>>> >>>>>>> 1248572409.144 RELEASE -1 FFFFFFFF >>>>>>> 6B93B1BF958703B3FC3CD1ADDD515695 200 >>>>>>> 1248572400 -1 1248572400 application/octet-stream >>>>>>> 1580815/1580815 GET >>>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? >>>>>>> 1248572570.897 SWAPOUT 00 0004CF23 >>>>>>> BEEE111A39B596B14903743011AF2C36 200 >>>>>>> 1248572570 1248490006 -1 image/jpeg 6181/6181 GET >>>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg >>>>>>> >>>>>>> >>>>>>> 1248572600.538 RELEASE -1 FFFFFFFF >>>>>>> 070416ED935AD18DCA793569D2C6A652 200 >>>>>>> 1248572570 -1 1248572570 application/octet-stream >>>>>>> 7703822/7703822 GET >>>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? >>>>>>> 1248572615.735 RELEASE -1 FFFFFFFF >>>>>>> B0EB42B39131DF028BA3BE9A39CC24E4 200 >>>>>>> 1248572604 -1 1248572604 application/octet-stream >>>>>>> 2109115/2109115 GET >>>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? >>>>>>> 1248572635.903 RELEASE -1 FFFFFFFF >>>>>>> CDCA0D3510080D121E5578310976676E 304 >>>>>>> 1248572635 -1 -1 unknown -1/0 GET >>>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg >>>>>>> >>>>>>> >>>>>>> 1248572886.822 RELEASE -1 FFFFFFFF >>>>>>> A95C86074129546301911C2FC251071D 200 >>>>>>> 1248572872 -1 1248572872 application/octet-stream >>>>>>> 2086824/2086824 GET >>>>>>> http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4? >>>>>>> >>>>>>> ###Wireshark >>>>>>> >>>>>>> Hypertext Transfer Protocol >>>>>>> HTTP/1.0 200 OK\r\n >>>>>>> Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n >>>>>>> Server: Apache\r\n >>>>>>> Content-Length: 6137729\r\n >>>>>>> Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n >>>>>>> Pragma: no-cache, no-store\r\n >>>>>>> Expires: -1\r\n >>>>>>> Content-Type: application/octet-stream\r\n >>>>>>> X-Cache: MISS from ichiban\r\n >>>>>>> X-Cache-Lookup: MISS from ichiban:3128\r\n >>>>>>> Via: 1.0 ichiban (squid)\r\n >>>>>>> Proxy-Connection: keep-alive\r\n >>>>>>> \r\n >>>>>>> >>>>>>> mos Jeffries wrote: >>>>>>> >>>>>>>> Jason Spegal wrote: >>>>>>>> >>>>>>>>> I was able to cache Pandora by compiling with >>>>>>>>> --enable-http-violations >>>>>>>>> and using a refresh_pattern to cache everything regardless. >>>>>>>>> This however >>>>>>>>> broke everything by preventing proper refreshing of any site. >>>>>>>>> If it could be >>>>>>>>> worked where violations only happened as directly specified in the >>>>>>>>> configuration it would be a workable solution. I did some >>>>>>>>> testing and I >>>>>>>>> could not confirm that it was anything in the configuration >>>>>>>>> file itself that >>>>>>>>> was causing the issue. I wouldn't recommend using this as such. >>>>>>>>> >>>>>>>>> >>>>>>>> Which indicates that there are fine tuning possible to cache >>>>>>>> just Pandora. >>>>>>>> Find yoursef one of the Pandora URLs in your access.log and take >>>>>>>> a visit to >>>>>>>> www.redbot.org or the ircache.org cacheability engine. >>>>>>>> >>>>>>>> >>>>>>>> Amos >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Henrik Nordstrom wrote: >>>>>>>>> >>>>>>>>>> lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> One of the largest consumers of our HTTP bandwidth is >>>>>>>>>>> Pandora, the free >>>>>>>>>>> music service. Unfortunately, Pandora marks its streams as >>>>>>>>>>> non-cacheable and >>>>>>>>>>> also puts question marks in the URLs, which is a huge waste >>>>>>>>>>> of bandwidth. >>>>>>>>>>> How can this be overridden? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> The questionmark can be ignored. See the "cache" directive. >>>>>>>>>> But if there >>>>>>>>>> is other parameters behind there (normally not logged) that >>>>>>>>>> just may not >>>>>>>>>> help.. >>>>>>>>>> >>>>>>>>>> Regarding non-cacheable.. most crap can be overridden by >>>>>>>>>> refresh_pattern. >>>>>>>>>> >>>>>>>>>> But, if it's a streaming service (I know nothing about >>>>>>>>>> Pandora) then you >>>>>>>>>> are quite likely out of luck. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Henrik >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >>