Search squid archive

Re: Caching Pandora

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 26 Jul 2009 11:03:00 -0400, Jason Spegal <jspegal@xxxxxxxxxxx>
wrote:
> Amos Jeffries wrote:
>> Jason Spegal wrote:
>>> I am currently using the following for the items in question.
>>>
>>> refresh_pattern pandora.com 0 300% 31536000
>>> refresh_pattern .               0       80%    3156000
>>
>> The dot (.) pattern matches every URL in existence.
>>
>> For the pandora files you don't need to go 300%, but do need to add 
>> all the available override-* and ignore-* violations available to the 
>> "pandora.com" pattern.
>>
>> I'd also try making the pandora pattern:
>>   -i http://[^a-z\.]*pandora\.com/?
>>
> Ok, the changes were made so the new line is
> refresh_pattern -i http://[^a-z\.]*pandora\.com/? 0 300% 31536000 
> override-expire reload-into-ims ignore-reload ignore-no-cache 
> ignore-private ignore-no-store ignore-auth
> 
> The following are the results from store.log after the change. It 
> appears that they are still failing to cache.
> 
> 1248619647.717 RELEASE -1 FFFFFFFF 2DDD8D498CF4C28F60520AA26761A1F6  200 
> 1248619640        -1 1248619640 application/octet-stream 1627255/1627255 
> GET http://audio-sjl-t2-1.pandora.com/access/7886817187448819808.mp4?
> 1248619657.439 RELEASE -1 FFFFFFFF 21387EECAF5FFCF61AEE68B2494F7A01  200 
> 1248619621        -1 1248619621 application/octet-stream 6327065/6327065 
> GET http://audio-sjl-t2-2.pandora.com/access/8544252120326380207.mp3?
> 1248619860.906 RELEASE -1 FFFFFFFF B838385F620C52ECE3B4F4E3BBC21270  200 
> 1248619847        -1 1248619847 application/octet-stream 2462059/2462059 
> GET http://audio-sjl-t1-2.pandora.com/access/3264482519687036142.mp4?
> 1248619895.636 RELEASE -1 FFFFFFFF 86A59F24244895283DC5BE8124F7C248  200 
> 1248619878        -1 1248619878 application/octet-stream 4585429/4585429 
> GET http://audio-sjl-t3-2.pandora.com/access/7586905698959626071.mp3?
> 
> 
>>>
>>> With violations off these work well. However they fail to cache all 
>>> the items I would like. When I had violations on I had tried 
>>> refresh_pattern . 0 0% 0 as well as setting all refresh_pattern to 0 
>>> 0% 0 which still failed to refresh the pages properly. I had also 
>>> tried rebuilding the cache from scratch several times.
>>>
>>> Other relevant pattern's I am using:
>>>
>>> #Dynamic Content
>>> refresh_pattern -i cgi-bin 0 0% 0 refresh-ims
>>
>> The following is a violation even if it works with violations not 
>> enabled.
>>> refresh_pattern -i \? 0 0% 3156000 refresh-ims
>>> refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0 
>>> refresh-ims
>>
>>> #HTML
>>> refresh_pattern text/html 0 80% 2592000 refresh-ims
>>> refresh_pattern text/css 0 80% 2592000 refresh-ims
>>>
>>> #Java & Javascript
>>> refresh_pattern -i .(js|jar|java) 0 100% 31536000
>>>
>>> #By MIME-Type
>>> refresh_pattern application/* 0 300% 31536000
>>> refresh_pattern audio/* 0 300% 31536000
>>> refresh_pattern images/* 0 300% 31536000
>>> refresh_pattern text/* 0 300% 31536000
>>> refresh_pattern video/* 0 300% 31536000
>>>
>>
>> ? mime patterns in the URL? with Squid?
>>
>> Do you have a patch that doe this? If so please consider contributing 
>> back to the project.
>>
> I take it your referring to refresh_pattern -i \? 0 0% 3156000 
> refresh-ims. I was under the impression that squid supports this. I am 
> using Squeezzer2 to check how well the patterns work. It does seem to
work.
> 
> Also the version of squid I am using is 3.0.16 with the following patches
> squid-3.0.16-adapted-zph.patch
> squid-3.0.16-cross-compile.patch
> squid-3.0.16-gentoo.patch
> 
> It is complied through Gentoo's Emerge.
>>>
>>> When I had violations on the Pandora entry was similar to this...
>>>
>>> refresh_pattern pandora.com 0 300% 31536000 override-expire 
>>> reload-into-ims ignore-reload ignore-no-cache ignore-private 
>>> ignore-no-store ignore-auth
>>
>> A single pattern like that should be all you need to add.
>>
>> Some of the non-caching parameters are only able to be overridden in 
>> the 2.HEAD code though. You may need to grab a copy of the HEAD code 
>> and use that.
>>
>>
>> PS. all of your file extension patterns above are using the very 
>> unsafe .XX syntax. The pattern is a regex and matches anywhere in the 
>> URL. Its likely catching a whole lot of URL which should not.
>>
>>  Please use:   \.XX(\?.*)?$   instead.  ie \.(js|jar|java)(\?.*)?$
>>
> I'm not sure I understand this example. Can you give a literal example 

Sorry, it wasn't very clear:
  refresh_pattern -i \.(js|jar|java)(\?.*)?$   0 100% 31536000

> please? From what I'm understanding your saying refresh_pattern -i .jpg 
> 0 300% 31536000 would be bad because http://www.jpgas.com would be 
> cached with that pattern which may, for sake of example, have settings 
> that would break that site. Your recommending refresh_pattern -i \.jpg 0 
> 300% 31536000 and not doing something like refresh_pattern -i 
> .(jpg|gif|png|ico|tga) 0 300% 31536000 ?

Exactly. The $ and (\?.*) bits make sure that the match is at the end of
the filename regardless of query string.

> 
> As far as messing with the code goes I haven't been into doing that as 
> of yet. For my purposes my goal is to build a server/router/etc that 
> will turn crappy internet into good internet while being able to service 
> a number of people. This runs my home network normally these days and 
> was originally conceived and built to support 1000 users for a dormitory 
> through a single cable modem. Tweaking it for maximium efficiency is a 
> hobby for me now. My coding skills are fairly weak and I wouldn't know 
> where to start for a lot of this. I am willing to help test things out 
> and such however.

You should not have to do more than a source build at worst to get this
going. Mark Nottingham from yahoo! and others have done a lot of work
towards making Squid configurable for this kind of thing. the down-side is
that the results of all that work are only available in 2.HEAD waiting for
port to Squid-3 (most of the refresh pattern stuff is) or a 2.8 release to
happen.

For now the future 2.8 code is at:
  http://www.squid-cache.org/Versions/v2/HEAD/

Amos

> 
>> Amos
>>
>>
>>> Amos Jeffries wrote:
>>>> Jason Spegal wrote:
>>>>> I would wager it's content control given what they are. However 
>>>>> with violations on they can be cached. Without they cannot. I just 
>>>>> haven't been able to figure out how to get squid to behave with 
>>>>> violations turned on. My only other option I can see is to setup a 
>>>>> second squid with violations and filter all the traffic to/from 
>>>>> Pandora through it.
>>>>
>>>> Use refresh_pattern with a regex that only matches pandora URL.
>>>>
>>>> I'll wager you have either added all the overrides to the . pattern, 
>>>> or have a overly-greedy regex in use.
>>>>
>>>> Amos
>>>>
>>>>>
>>>>> Adrian Chadd wrote:
>>>>>> This doesn't surprise me. They may be trying to maximise outbound
>>>>>> bits, or try to retain control over content, or not understanding
>>>>>> caching, or all/combination of the above.
>>>>>>
>>>>>> I'd suggest contacting them and asking.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> adrian
>>>>>>
>>>>>> 2009/7/26 Jason Spegal <jspegal@xxxxxxxxxxx>:
>>>>>>  
>>>>>>> A little bit messy but here are some snippets.
>>>>>>>
>>>>>>> ###Access.log
>>>>>>>
>>>>>>> 1248572380.275    178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 
>>>>>>> 232 GET
>>>>>>>
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
>>>>>>>
>>>>>>>
>>>>>>> - DIRECT/208.85.40.13 -
>>>>>>> 1248572409.144   8472 10.10.122.241 TCP_MISS/200 1581181 GET
>>>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
>>>>>>> DIRECT/208.85.41.38 application/octet-stream
>>>>>>> 1248572439.512     94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
>>>>>>>
http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
>>>>>>>
>>>>>>>
>>>>>>> - NONE/- image/jpeg
>>>>>>> 1248572570.898    300 10.10.122.248 TCP_MISS/200 6521 GET
>>>>>>>
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>>>
>>>>>>>
>>>>>>> - DIRECT/208.85.41.23 image/jpeg
>>>>>>> 1248572600.538  29937 10.10.122.248 TCP_MISS/200 7704188 GET
>>>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
>>>>>>> DIRECT/208.85.41.38 application/octet-stream
>>>>>>> 1248572615.735  11507 10.10.122.241 TCP_MISS/200 2109481 GET
>>>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
>>>>>>> DIRECT/208.85.41.36 application/octet-stream
>>>>>>> 1248572635.903    179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 
>>>>>>> 232 GET
>>>>>>>
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>>>
>>>>>>>
>>>>>>> - DIRECT/208.85.41.23 -
>>>>>>> 1248572641.444     40 10.10.122.241 TCP_HIT/200 21616 GET
>>>>>>>
http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
>>>>>>>
>>>>>>>
>>>>>>> - NONE/- image/jpeg
>>>>>>>
>>>>>>> ###Store.log
>>>>>>>
>>>>>>> 1248572380.275 RELEASE -1 FFFFFFFF 
>>>>>>> 097EAE1108DCEF192ED1C3BFF1F6C1B5  304
>>>>>>> 1248572380        -1        -1 unknown -1/0 GET
>>>>>>>
http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
>>>>>>>
>>>>>>>
>>>>>>> 1248572409.144 RELEASE -1 FFFFFFFF 
>>>>>>> 6B93B1BF958703B3FC3CD1ADDD515695  200
>>>>>>> 1248572400        -1 1248572400 application/octet-stream 
>>>>>>> 1580815/1580815 GET
>>>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
>>>>>>> 1248572570.897 SWAPOUT 00 0004CF23 
>>>>>>> BEEE111A39B596B14903743011AF2C36  200
>>>>>>> 1248572570 1248490006        -1 image/jpeg 6181/6181 GET
>>>>>>>
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>>>
>>>>>>>
>>>>>>> 1248572600.538 RELEASE -1 FFFFFFFF 
>>>>>>> 070416ED935AD18DCA793569D2C6A652  200
>>>>>>> 1248572570        -1 1248572570 application/octet-stream 
>>>>>>> 7703822/7703822 GET
>>>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
>>>>>>> 1248572615.735 RELEASE -1 FFFFFFFF 
>>>>>>> B0EB42B39131DF028BA3BE9A39CC24E4  200
>>>>>>> 1248572604        -1 1248572604 application/octet-stream 
>>>>>>> 2109115/2109115 GET
>>>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
>>>>>>> 1248572635.903 RELEASE -1 FFFFFFFF 
>>>>>>> CDCA0D3510080D121E5578310976676E  304
>>>>>>> 1248572635        -1        -1 unknown -1/0 GET
>>>>>>>
http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>>>
>>>>>>>
>>>>>>> 1248572886.822 RELEASE -1 FFFFFFFF 
>>>>>>> A95C86074129546301911C2FC251071D  200
>>>>>>> 1248572872        -1 1248572872 application/octet-stream 
>>>>>>> 2086824/2086824 GET
>>>>>>> http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?
>>>>>>>
>>>>>>> ###Wireshark
>>>>>>>
>>>>>>> Hypertext Transfer Protocol
>>>>>>> HTTP/1.0 200 OK\r\n
>>>>>>> Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
>>>>>>> Server: Apache\r\n
>>>>>>> Content-Length: 6137729\r\n
>>>>>>> Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
>>>>>>> Pragma: no-cache, no-store\r\n
>>>>>>> Expires: -1\r\n
>>>>>>> Content-Type: application/octet-stream\r\n
>>>>>>> X-Cache: MISS from ichiban\r\n
>>>>>>> X-Cache-Lookup: MISS from ichiban:3128\r\n
>>>>>>> Via: 1.0 ichiban (squid)\r\n
>>>>>>> Proxy-Connection: keep-alive\r\n
>>>>>>> \r\n
>>>>>>>
>>>>>>> mos Jeffries wrote:
>>>>>>>  
>>>>>>>> Jason Spegal wrote:
>>>>>>>>   
>>>>>>>>> I was able to cache Pandora by compiling with 
>>>>>>>>> --enable-http-violations
>>>>>>>>> and using a refresh_pattern to cache everything regardless. 
>>>>>>>>> This however
>>>>>>>>> broke everything by preventing proper refreshing of any site. 
>>>>>>>>> If it could be
>>>>>>>>> worked where violations only happened as directly specified in
the
>>>>>>>>> configuration it would be a workable solution. I did some 
>>>>>>>>> testing and I
>>>>>>>>> could not confirm that it was anything in the configuration 
>>>>>>>>> file itself that
>>>>>>>>> was causing the issue. I wouldn't recommend using this as such.
>>>>>>>>>
>>>>>>>>>         
>>>>>>>> Which indicates that there are fine tuning possible to cache 
>>>>>>>> just Pandora.
>>>>>>>> Find yoursef one of the Pandora URLs in your access.log and take 
>>>>>>>> a visit to
>>>>>>>> www.redbot.org or the ircache.org cacheability engine.
>>>>>>>>
>>>>>>>>
>>>>>>>> Amos
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   
>>>>>>>>> Henrik Nordstrom wrote:
>>>>>>>>>     
>>>>>>>>>> lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:
>>>>>>>>>>
>>>>>>>>>>       
>>>>>>>>>>> One of the largest consumers of our HTTP bandwidth is 
>>>>>>>>>>> Pandora, the free
>>>>>>>>>>> music service. Unfortunately, Pandora marks its streams as 
>>>>>>>>>>> non-cacheable and
>>>>>>>>>>> also puts question marks in the URLs, which is a huge waste 
>>>>>>>>>>> of bandwidth.
>>>>>>>>>>> How can this be overridden?
>>>>>>>>>>>
>>>>>>>>>>>             
>>>>>>>>>> The questionmark can be ignored. See the "cache" directive. 
>>>>>>>>>> But if there
>>>>>>>>>> is other parameters behind there (normally not logged) that 
>>>>>>>>>> just may not
>>>>>>>>>> help..
>>>>>>>>>>
>>>>>>>>>> Regarding non-cacheable.. most crap can be overridden by
>>>>>>>>>> refresh_pattern.
>>>>>>>>>>
>>>>>>>>>> But, if it's a streaming service (I know nothing about 
>>>>>>>>>> Pandora) then you
>>>>>>>>>> are quite likely out of luck.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Henrik
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>           
>>>>>>>>       
>>>>>>>     
>>>>>
>>>>
>>>>
>>>
>>
>>

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux