Re: Squid with PHP & Apache

Ghassan Gharabli <sounarose@xxxxxxxxxxxxxx> · Wed, 27 Nov 2013 06:30:28 +0200

On Tue, Nov 26, 2013 at 5:30 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 26/11/2013 10:13 a.m., Ghassan Gharabli wrote:
>>  Hi,
>>
>> I have built a PHP script to cache HTTP 1.X 206 Partial Content like
>> "WindowsUpdates" & Allow seeking through Youtube & many websites .
>>
>
> Ah. So you have written your own HTTP caching proxy in PHP. Well done.
> Did you read RFC 2616 several times? your script is expected to to obey
> all the MUST conditions and clauses in there discussing "proxy" or "cache".
>

Yes , I have read it and I will read it again , but the reason i am
building such a script is because internet here in Lebanon is really
expensive and scarce.

As you know Youtube is sending dynamic chunks for each video . For
example , if you watch a video on Youtube more than 10 times , then
Squid fill up the cache with more than 90 chunks per video , that is
why allowing to seek at any position of the video using my script
would save me the headache .

>
>
> NOTE: the easy way to do this is to upgrade your Squid to the current
> series and use ACLs on the range_offset_limit directive. That way Squid
> will convert Range requests to normal fetch requests and cache the
> object before sending the requested pieces of it back to the client.
> http://www.squid-cache.org/Doc/config/range_offset_limit/
>
>

I have successfully supported HTTP/206, if the object is cached and my
target is to enable Range headers, as I can see that iPhones or Google
Chrome check if the server has a header Accept-Ranges: Bytes then they
send a request bytes=x-y or multiple bytes like bytes=x-y,x-y .

>> I am willing to move from PHP to C++ hopefully after a while.
>>
>> The script is almost finished , but I have several question, I have no
>> idea if I should always grab the HTTP Response Headers and send them
>> back to the borwsers.
>
> The response headers you get when receiving the object are meta data
> describing that object AND the transaction used to fetch it AND the
> network conditions/pathway used to fetch it. The cachs job is to store
> those along with the object itself and deliver only the relevant headers
> when delivering a HIT.
>
>>
>> 1) Does Squid still grab the "HTTP Response Headers", even if the
>> object is already in cache or Squid has already a cached copy of the
>> HTTP Response header . If Squid caches HTTP Response Headers then how
>> do you deal with HTTP CODE 302 if the object is already cached . I am
>> asking this question because I have already seen most websites use
>> same extensions such as .FLV including Location Header.
>
> Yes. All proxies on the path are expected to relay the end-to-end
> headers, drop the hop-by-hop headers, and MUST update/generate the
> feature negotiation and state information headers to match its
> capabilities in each direction.
>
>

Do you mean by Yes , for grabbing the Http Response Headers even if
the object is already in cache, so therefore latency of network is
always added even if MISS or HIT situation?. I have tested Squid and I
have noticed that reading HIT objects from Squid takes about 0.x ms,
which I believe objects are always offline until expiry occurs.Right?

Till now I am using $http_response_headers as it is the fastest method
by far , but I still have an issue with latency as for each request
the function takes about 0.30s, which is really high, even if my
network latency is 100~150 ms. That is why I have thought that I could
possibly grab the HTTP Response Headers for the first time and store
them, so if the URI was called for a second time, then I would send
them the cached Headers instead of grabbing them again , to eliminate
the network latency. But I still have an issue ... How am i going to
know if the website sends HTTP/302 (because some websites send
HTTP/302 for the same requested file name ), if I am not grabbing the
header again in a HIT situation just to improve the latency. Second
issue is Saving headers of CDN.

>>
>> 2) Do you also use mime.conf to send the Content-Type to the browser
>> in case of FTP/HTTP or only FTP ?
>
> Only FTP and Gopher *if* Squid is translating from the native FTP/Gopher
> connection to HTTP. HTTP and protocols relayed using HTTP message format
> are expected to supply the correct header.
>
>>
>> 3) Does squid compare the length of the local cached copy with the
>> remote file if you already have the object file or you use
>> refresh_pattern?.
>
> Content-Length is a declaration of how many payload bytes are following
> the response headers. It has no relation to the servers object except in
> the special case where the entire object is being delivered as payload
> without any encoding.
>
>

I am only caching objects that have "Content-Length" header, if the
size was greater than 0 and I have noticed that there are some files
like XML , CSS , JS, which I believe I should save, but do you think I
must follow if-modified header to see if there is a fresh copy?.

>>
>> 4) What happens if the user modies a refresh_pattern to cache an
>> object, for example .xml which does not have [Content-Length] header.
>> Do you still save it, or would you search for the ignore-headers used
>> to force caching the object and what happens if the cached copy
>> expires , do you still refresh the copy even if there is no
>> Content-Length header?.
>
> refresh_pattern does not cause caching of any objects. What it does is
> tell Squid how long an object is valid for before it needs to be
> revalidated or replaced. In some situations this can affect caching
> decision, in most it only affects expiry.
>
>
> Objects without content-length are handled differently by HTTP/1.0 and
> HTTP/1.1 software.
>
> When either end of the connection is advertising HTTP/1.0 the sending
> software is expected to terminate the TCP connection on completion of
> the payload block.
>
> When both ends advertise HTTP/1.1 the sending software is expected to
> use Transfer-Encoding:chunked in order to keep the connection alive
> unless the client sent Connection:close.
>  Doing the HTTP/1.0 behaviour is also acceptible if both ends are
> HTTP/1.1, but causes a performance loss due to churn and setup costs of TCP.
>
>
>
>
>>
>> I am really confused with this issue , because I am always getting a
>> headers list from the internet and I send them back to the browser
>> (using PHP and Apache) even if the object is in cache.
>
> I am really confused about what you are describing here. You should only
> get a headers list from the upstream server if you have contacted one.
>
>
> You say the script is sending to the browser. This is not true at the
> HTTP transaction level. The script sends to Apache, Apache sends to
> whichever software requested from it.
>
> What is the order you chained the Browser, Apache and Squid ?
>
>   Browser -> Squid -> Apache -> Script -> Origin server
> or,
>   Browser -> Apache -> Script -> Squid -> Origin server
>
>
> Amos

Squid configured as:
 Browser -> Squid -> Apache -> Script -> Origin server

url_rewrite_program c:/PHP/php.exe c:/squid/etc/redir.php
acl dont_pass  url_regex ^http:\/\/192\.168\.10\.[0-9]\:312(6|7|8)\/.*?
acl denymethod method POST
acl denymethod method PUT
url_rewrite_access deny dont_pass
url_rewrite_access deny denymethod
url_rewrite_access allow all
url_rewrite_children 10
#url_rewrite_concurrency 99

I hope I can enable url_rewrite_concurrency , but if I enable
concurrency then I must always echo back the ID, even if I am hitting
cache or maybe I dont understand the behavior of the URL_REWRITE
manual while fgets(STDIN) .
If I open any website through my script, then I see that the execution
time plays between 0.24 ms ~ 3 s differently at any object (if
$http_response_header was called).

Your help is greatly appreciated and thank you for your time.

Regards,
Ghassan