On 27/11/2011 5:32 a.m., Ghassan Gharabli wrote:
Hello Amos,
Finally, I have almost captured the most YouTube Videos except
something I want to get some asistance from you .
As I have tested before and tried so many times .. Chudy's script is outdated.
After testinig and logging Youtube Videos . I finally have found
something not being fully cached . If you still remember I have said
before with my old messages that ID isnt being captured in all places
but its okay I have done this . I will post my details after I
completelly finish them.
Could you please explain to me whats happening here?
If&range=13-2375679 was found in a URL then Squid doesnt understand
how to cache the full video .. as it only cache the first 13 seconds I
guess! and then it stops . If I try to download this finished cached
movie then you notice its size about 2.2 MB . You try to remove it
from cache then Squid cant even find it as it claims not cached but
shows TCP_HIT in access.log . STRANGE!
(NP: by remove you mean PURGE request? HUT just means cached data was
found to service the request, which is right since purging the data
involves locating it (HITing) before erasing the cached entry. Followup
requests after the purge should not be HIT.).
I took a look at these"range" replies being generated by YT a while back.
What I found was that a request for video URL would send back a FLV
object with bytes eg "[SWF...]ABCDEFGH". All fine and good this is the
cacheable video.
If the user skips around in the video the player generates a range=
request stating what timestamp or bytes they want to strat at. Its not
clear which due to the reply which comes back having a *different* byte
sequence than the video at the same URL. For example, on the
"[SWF...]ABCDEFGH" video it would produce: "[SWF...]EFGH" or something
similar.
Under the HTTP rules the range object to be combined must be a snippet
portion of the base object (range 4-999, should have been just "DEFGH").
By adding the SWF headers on each reply YT are making them unique and
different objects. Combining them in the middle (ie by a caching app)
will cause errors in the binary object and crash the Flash player or
cause it to display an error message instead of the video
This range request only seems to happen if the user skips into a portion
of video the player has not yet downloaded. So sending them the whole
video, which is what we try to do with Squid, will cause a display lag
for the user but not cause problems in their player.
Now look into this URL:
-------------------------------
"http://o-o.preferred.orange-par1.v4.lscache7.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=907605%2C912600%2C915002&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=8223490C23E48CB708E04666E4
A550422757CEC6.9D8D78E66DD14FEFC4B5F960F493ED4CDFD7C51C&source=youtube&expire=13
22348400&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPVl9FSkNOMV9LSVpFOkpsV3BkS1B1ZXN
F&id=e120643085f56831&range=13-2375679"
HTTP/1.0 200 OK
Last-Modified: Fri, 27 Nov 2009 12:44:54 GMT
Content-Type: video/x-flv
Date: Sat, 26 Nov 2011 16:06:29 GMT
Expires: Sat, 26 Nov 2011 16:06:29 GMT
Cache-Control: private, max-age=24511
Accept-Ranges: bytes
Content-Length: 2375667
X-Content-Type-Options: nosniff
Server: gvs 1.0
X-Cache: MISS from Peer6
X-Cache-Lookup: MISS from Peer6:3128
Connection: close
Whats the job of "Accept_ranges: bytes" here?
Accept-* means the software producing that reply or request supports a
certain HTTP feature. In this case it is Squid and maybe the server as
well supporting HTTP range requests. Not related to YT particulary.
And the very confusion again you can see another similar URL with the
same "/videoplayback?.*(id)" and here comes the ID inthe end of this
URL then moves temporary just . I must mention that this URL sends the
FLV url as Squid already read it in access.log and then it dds
&ir=1&playretry=1 or pr=1&playretry which means Squid would be
confused to cache it 2 times (FLV).
EXAMPLE:
---------------
"http://o-o.preferred.orange-par1.v3.lscache3.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C910207%2C916201&algorithm=throttle
-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=0489805DCC95F6EADBA9D43C3F
D8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC47534C7D&source=youtube&expire=13
22344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPUl9FSkNOMV9LSVZJOmdmQWdwWC01dlp
n&id=283246f338ece5ad"
HTTP/1.0 302 Moved Temporarily
Last-Modified: Wed, 02 May 2007 10:26:10 GMT
Date: Sat, 26 Nov 2011 15:50:47 GMT
Expires: Sat, 26 Nov 2011 15:50:47 GMT
Cache-Control: private, max-age=900
Location: http://r9.orange-par2.c.youtube.com/videoplayback?sparams=id%2Cexpire%
2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C91
0207%2C916201&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&sign
ature=0489805DCC95F6EADBA9D43C3FD8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC4
7534C7D&source=youtube&expire=1322344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1N
PUl9FSkNOMV9LSVZJOmdmQWdwWC01dlpn&id=283246f338ece5ad&ir=1
X-Content-Type-Options: nosniff
Content-Type: text/html
Server: gvs 1.0
Age: 2068
Content-Length: 0
X-Cache: HIT from Peer6
X-Cache-Lookup: HIT from Peer6:3128
Connection: close
This is the 302 redirect Adrian and Chudy were discussing at the end of
the wiki page. If you cache it with storeurl_access reductions it will
loop infinitely back at itself.
Amos