Re: kernel.org mirroring (Re: [GIT PULL] MMC update)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
> Jakub Narebski wrote:
>>
>> And in CGI standard there is a way to access additional HTTP headers
>> info from CGI script: the envirionmental variables are HTTP_HEADER,
>> for example if browser sent If-Modified-Since: header it's value
>> can be found in HTTP_IF_MODIFIED_SINCE environmental variable.
> 
> The CGI spec does not at all guarantee that the CGI environment will 
> contain all the HTTP headers sent by the client.  That was the point of 
> the environment dump script -- you can see exactly which headers are, 
> and are not, passed through to CGI.
> 
> CGI only /guarantees/ a bare minimum (things like QUERY_STRING, 
> PATH_INFO, etc.)
> 
> Even basic server info environment variables are optional.

I have checked that at least Apache 2.0.54 passes HTTP_IF_MODIFIED_SINCE
when getting If-Modified-Since: header (my own script + netcat/nc).
 
>> It is ETag, not E-tag. Besides, I don't see what the attached script is
>> meant to do: it does not output the sample file anyway.
> 
> It's not meant to output the sample file.  It outputs the server 
> metadata sent to the CGI script (the environment variables).  The sample 
> file was simply a way to play around with etag and last-modified metadata.

Ah. 
 
>> The idea is of course to stop processing in CGI script / mod_perl script
>> as soon as possible if cache validates.
> 
> Certainly.  That should help cut down on I/O.  FWIW though the projects 
> list is particularly painful, with its File::Find call, which you'll 
> need to do in order to return 304-not-modified.

First, it is better to use $projects_list which is projects index file
in the format:
  <project path> SPC <project owner>
where <project path> is relative to $projectroot and is URI encoded; well
at least SPC has to be URI (percent) encoded. <project owner> is owner
of given project, and is also URI encoded (one would usually use '+' in
the place of SPC here).

Gitweb now can generate projects list in above format, by using
"project_index" action ("a=project_index" query string), or by clicking
'TXT' link at the bottom of the projects list page in new gitweb: see
http://repo.or.cz by Petr Baudis. The problem is that it generates
projects list from the list of projects it sees, so to generate it from
scratch from the filesystem you have for generating "project_index"
to have $projects_list a directory (changing it to something that
evals to false, e.g. undef or "" makes gitweb use $projectroot for
$projects_list). I have posted how to do this.

The project list changes rarely, only on addition/removal of project,
and on changing owner of project; so it can be generated on demand.


Second, even with $projects_list being set to projects index file
as of now gitweb runs git-for-each-ref (which scans refs and access
pack file for commit date), checks for description file and reads it;
for $projects_list being directory it also checks project directory
owner. I plan to make it configurable to read last activity from
all heads (all branches) as it is now, from HEAD (current branch)
as it was before, or given branch (for example 'master').

Assuming that gitweb is configured to read last activity from single
defined branch, generating ETag = checksum(sha1 of heads of projects)
needs at least read one file from each project.
 
>> I don't know if Apache intercepts and remembers ETag and Last-Modified
>> headers, adds 304 Not Modified HTTP response on finding that cache validates
>> and cuts out CGI script output. I.e. if browser provided If-Modified-Since:,
>> script wrote Last-Modified: header, If-Modified-Since: is no earlier than
>> Last-Modified: (usually is equal in the case of cache validation), then
>> Apache provides 304 Not Modified response instead of CGI script output.
> 
> This wanders into the realm of mod_cache configuration, I think.  (which 
> I have tried to get working as reverse proxy, and failed serveral times) 
>   If you are not using mod_*_cache, then Apache must execute the CGI 
> script every time AFAICS, regardless of etag/[if-]last-mod headers.

No, it wanders into realm of header parsing by Apache, and NPH (No Parse
Headers) option.

Even if Apache does execute CGI script to completion every time, it might
not send the output of the script, but HTTP 304 Not Modified reply. Might.
I don't know if it does.

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]