Re: [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 10 Dec 2010 13:10:53 +0100

On Fri, 10 Dec 2010, J.H. wrote:

>>> This is fairly self explanatory, these are here just to centralize the checking
>>> for these types of actions, as special things need to be done with regards to
>>> them inside the caching engine.
>>>
>>> isBinaryAction() returns true if the action deals with creating binary files
>>> (this needing :raw output)
>> 
>> Why do you need special case binary / :raw output?  It is not really
>> necessary if it is done in right way, as shown in my rewrite.
> 
> Because that's not how my caching engine does it, and the reason for
> that is I am mimicking how the rest of gitweb does it.

To shorten the explanation why treating binary (needing :raw) output in
a special way is not necessary: with the way gitweb code is structured
(with "binmode STDOUT, ':raw'" inside action subroutine), with the way
capturing output is done (by redirecting STDOUT), and even with the way
kernel.org caching code is structured the only thing that needs to be
done to support both text (:utf8, as set at beginning of gitweb) and
binary (:raw) output is to *dump cache to STDOUT in binary mode*:

	binmode $cache_fh, ':raw';
	binmode STDOUT, ':raw';
	File::Copy::copy($fh, \*STDOUT);

Nothing more.

Just dump cache file to STDOUT in binary mode.

> I attempted at one point to do as you were suggesting, and it became too
> cumbersome.  I eventually broke out the 'binary' packages into a special
> case (thus mimicking how gitweb is already doing things), which also
> gives me the advantage of being able to checksum the resulting binary
> out of band, as well as being able to more trivially calculate the file
> size being sent.

I don't see how it needs to be special-cased: the ordinary output would
also take advantage of this.  Note that plain 'blob' action can also
be quite large.

If there is to be done smarter, i.e. HTTP-aware, parsing and dumping of
cache entry file, e.g. by reading the HTTP header part to memory and
fiddling with HTTP headers (e.g. adding Content-Length header), it can be
done in a contents-agnostic way.

Note that with the way I do it in my rewrite, namely saving cached output
to temporary file to rename it to final destination later (atomic update),
we can do mungling of HTTP headers before/during this final copying to
final file, e.g. calculating Content-Length and perhaps Content-MD5 
headers.

> 
>>> isFeedAction() returns true if the action deals with a news feed of some sort,
>>> basically used to bypass the 'Generating...' message should it be a news reader
>>> as those will explode badly on that page.
>> 
>> Why blacklisting 'feed', instead of whitelisting HTML-output?
> 
> There are a limited number of feed types and their ilk (standard xml
> formatted feed and atom), there are lots of html-output like things.
> Easier to default and have things work, generally, than to have things
> not work the way you would expect.

Ah, I see from what you written in other subthreads of this thread that
you prefer to have "Generating..." page where it is not wanted that not
have it where it could be useful (i.e. blacklist approach), while I took
the opposite side (i.e. whitelist approach).

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html