On Tue, Nov 26, 2013 at 5:30 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: > On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote: >> Hi, >> >> I have built a PHP script to cache HTTP 1.X 206 Partial Content like >> "WindowsUpdates" & Allow seeking through Youtube & many websites . >> > > Ah. So you have written your own HTTP caching proxy in PHP. Well done. > Did you read RFC 2616 several times? your script is expected to to obey > all the MUST conditions and clauses in there discussing "proxy" or "cache". > Yes , I have read it and I will read it again , but the reason i am building such a script is because internet here in Lebanon is really expensive and scarce. As you know Youtube is sending dynamic chunks for each video . For example , if you watch a video on Youtube more than 10 times , then Squid fill up the cache with more than 90 chunks per video , that is why allowing to seek at any position of the video using my script would save me the headache . > > > NOTE: the easy way to do this is to upgrade your Squid to the current > series and use ACLs on the range_offset_limit directive. That way Squid > will convert Range requests to normal fetch requests and cache the > object before sending the requested pieces of it back to the client. > http://www.squid-cache.org/Doc/config/range_offset_limit/ > > I have successfully supported HTTP/206, if the object is cached and my target is to enable Range headers, as I can see that iPhones or Google Chrome check if the server has a header Accept-Ranges: Bytes then they send a request bytes=x-y or multiple bytes like bytes=x-y,x-y . >> I am willing to move from PHP to C++ hopefully after a while. >> >> The script is almost finished , but I have several question, I have no >> idea if I should always grab the HTTP Response Headers and send them >> back to the borwsers. > > The response headers you get when receiving the object are meta data > describing that object AND the transaction used to fetch it AND the > network conditions/pathway used to fetch it. The cachs job is to store > those along with the object itself and deliver only the relevant headers > when delivering a HIT. > >> >> 1) Does Squid still grab the "HTTP Response Headers", even if the >> object is already in cache or Squid has already a cached copy of the >> HTTP Response header . If Squid caches HTTP Response Headers then how >> do you deal with HTTP CODE 302 if the object is already cached . I am >> asking this question because I have already seen most websites use >> same extensions such as .FLV including Location Header. > > Yes. All proxies on the path are expected to relay the end-to-end > headers, drop the hop-by-hop headers, and MUST update/generate the > feature negotiation and state information headers to match its > capabilities in each direction. > > Do you mean by Yes , for grabbing the Http Response Headers even if the object is already in cache, so therefore latency of network is always added even if MISS or HIT situation?. I have tested Squid and I have noticed that reading HIT objects from Squid takes about 0.x ms, which I believe objects are always offline until expiry occurs.Right? Till now I am using $http_response_headers as it is the fastest method by far , but I still have an issue with latency as for each request the function takes about 0.30s, which is really high, even if my network latency is 100~150 ms. That is why I have thought that I could possibly grab the HTTP Response Headers for the first time and store them, so if the URI was called for a second time, then I would send them the cached Headers instead of grabbing them again , to eliminate the network latency. But I still have an issue ... How am i going to know if the website sends HTTP/302 (because some websites send HTTP/302 for the same requested file name ), if I am not grabbing the header again in a HIT situation just to improve the latency. Second issue is Saving headers of CDN. >> >> 2) Do you also use mime.conf to send the Content-Type to the browser >> in case of FTP/HTTP or only FTP ? > > Only FTP and Gopher *if* Squid is translating from the native FTP/Gopher > connection to HTTP. HTTP and protocols relayed using HTTP message format > are expected to supply the correct header. > >> >> 3) Does squid compare the length of the local cached copy with the >> remote file if you already have the object file or you use >> refresh_pattern?. > > Content-Length is a declaration of how many payload bytes are following > the response headers. It has no relation to the servers object except in > the special case where the entire object is being delivered as payload > without any encoding. > > I am only caching objects that have "Content-Length" header, if the size was greater than 0 and I have noticed that there are some files like XML , CSS , JS, which I believe I should save, but do you think I must follow if-modified header to see if there is a fresh copy?. >> >> 4) What happens if the user modies a refresh_pattern to cache an >> object, for example .xml which does not have [Content-Length] header. >> Do you still save it, or would you search for the ignore-headers used >> to force caching the object and what happens if the cached copy >> expires , do you still refresh the copy even if there is no >> Content-Length header?. > > refresh_pattern does not cause caching of any objects. What it does is > tell Squid how long an object is valid for before it needs to be > revalidated or replaced. In some situations this can affect caching > decision, in most it only affects expiry. > > > Objects without content-length are handled differently by HTTP/1.0 and > HTTP/1.1 software. > > When either end of the connection is advertising HTTP/1.0 the sending > software is expected to terminate the TCP connection on completion of > the payload block. > > When both ends advertise HTTP/1.1 the sending software is expected to > use Transfer-Encoding:chunked in order to keep the connection alive > unless the client sent Connection:close. > Doing the HTTP/1.0 behaviour is also acceptible if both ends are > HTTP/1.1, but causes a performance loss due to churn and setup costs of TCP. > > > > >> >> I am really confused with this issue , because I am always getting a >> headers list from the internet and I send them back to the browser >> (using PHP and Apache) even if the object is in cache. > > I am really confused about what you are describing here. You should only > get a headers list from the upstream server if you have contacted one. > > > You say the script is sending to the browser. This is not true at the > HTTP transaction level. The script sends to Apache, Apache sends to > whichever software requested from it. > > What is the order you chained the Browser, Apache and Squid ? > > Browser -> Squid -> Apache -> Script -> Origin server > or, > Browser -> Apache -> Script -> Squid -> Origin server > > > Amos Squid configured as: Browser -> Squid -> Apache -> Script -> Origin server url_rewrite_program c:/PHP/php.exe c:/squid/etc/redir.php acl dont_pass url_regex ^http:\/\/192\.168\.10\.[0-9]\:312(6|7|8)\/.*? acl denymethod method POST acl denymethod method PUT url_rewrite_access deny dont_pass url_rewrite_access deny denymethod url_rewrite_access allow all url_rewrite_children 10 #url_rewrite_concurrency 99 I hope I can enable url_rewrite_concurrency , but if I enable concurrency then I must always echo back the ID, even if I am hitting cache or maybe I dont understand the behavior of the URL_REWRITE manual while fgets(STDIN) . If I open any website through my script, then I see that the execution time plays between 0.24 ms ~ 3 s differently at any object (if $http_response_header was called). Your help is greatly appreciated and thank you for your time. Regards, Ghassan