Re: anyone knows some info about youtube "range" parameter?

Hasanen AL-Bana <hasanen@xxxxxxxxx> · Fri, 27 Apr 2012 09:52:42 +0300



On Fri, Apr 27, 2012 at 7:43 AM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote:
> On 25/04/2012 20:48, Hasanen AL-Bana wrote:
>>
>> wouldn't be better if we save the video chunks ? youtube is streaming
>> files with 1.7MB flv chunks, youtube flash player knows how to merge
>> them and play them....so the range start and end will alaways be the
>> same for the same video as long as user doesn't fast forward it or do
>> something nasty...even in that case , squid will just cache that
>> chunk...that is possible by rewriting the STORE_URL and including the
>> range start&  end
>>
>>
>> On Wed, Apr 25, 2012 at 8:39 PM, Ghassan Gharabli
>> <sounarose@xxxxxxxxxxxxxx>  wrote:
>
> <SNIP>
>
> i have written a small ruby store_url_rewrite that works with range argument
> in the url.
> (on the bottom of this mail)
>
> it's written in ruby and i took some of andre work at
> http://youtube-cache.googlecode.com
>
> it's not such a fancy script and ment only for this specific youtube
> problem.
>
> i know that youtube didnt changed the this range behavior for the whole
> globe cause as for now i'm working from a remote location that still has no
> "range" at all in the url.
> so in the same country you can get two different url patterns.
>
> this script is not cpu friendly (uses more the same amount of regex lookups
> always) but it's not what will bring your server down!!!

That is why I am going to write it in perl, in my server I might need
to run more than 40 instances on the script and perl is like the
fastest thing I have ever tested

>
> this is only a prototype and if anyone wants to add some more domains and
> patterns i will be more then glad to make this script better then it's now.
>
> this is one hell of a regex nasty script and i could have used the uri and
> cgi libs in order to make the script more user friendly but i choose to just
> build the script skeleton and move on from there using the basic method and
> classes of ruby.
>
> the idea of this script is to extract each of the arguments such as id itag
> and ragne one by one and to not use one regex to extract them all because
> there are couple of url structures being used by youtube.
>
> if someone can help me to reorganize this script to allow it to be more
> flexible for other sites with numbered cases per site\domain\url_structure i
> will be happy to get any help i can.
>
> planned for now to be added into this scripts are:
> source forge catch all download mirrors into one object
> imdb HQ (480P and up) videos
> vimeo videos
>
> if more then just one man will want:
> bliptv
> some of facebook videos
> some other images storage sites.
>
> if you want me to add anything to my "try to cache" list i will be help to
> hear from you on my e-mail.
>
> Regards,
> Eliezer
>
>
> ##code start##
> #!/usr/bin/ruby
> require "syslog"
>
> class SquidRequest
>        attr_accessor :url, :user
>        attr_reader :client_ip, :method
>
>        def method=(s)
>                @method = s.downcase
>        end
>
>        def client_ip=(s)
>                @client_ip = s.split('/').first
>        end
> end
>
> def read_requests
>        # URL <SP> client_ip "/" fqdn <SP> user <SP> method [<SP>
> kvpairs]<NL>
>        STDIN.each_line do |ln|
>                r = SquidRequest.new
>                r.url, r.client_ip, r.user, r.method, *dummy =
> ln.rstrip.split(' ')
>                (STDOUT << "#{yield r}\n").flush
>        end
> end
>
> def log(msg)
>        Syslog.log(Syslog::LOG_ERR, "%s", msg)
> end
>
> def main
>        Syslog.open('nginx.rb', Syslog::LOG_PID)
>        log("Started")
>
>        read_requests do |r|
> idrx = /.*(id\=)([A-Za-z0-9]*).*/
> itagrx = /.*(itag\=)([0-9]*).*/
> rangerx = /.*(range\=)([0-9\-]*).*/
>
> newurl = "http://video-srv.youtube.com.SQUIDINTERNAL/id_"; +
> r.url.match(idrx)[2] + "_itag_" + r.url.match(itagrx)[2] + "_range_" +
> r.url.match(rangerx)[2]
>
>        log("YouTube Video [#{newurl}].")
>
>                newurl
>        end
> end
>
> main
> ##code end#
>
>
>
> --
> Eliezer Croitoru
> https://www1.ngtech.co.il
> IT consulting for Nonprofit organizations
> eliezer <at> ngtech.co.il