On Thu, 5 March 2009, Dave wrote: > Jakub Narebski wrote: >> Dave <kilroyd@xxxxxxxxxxxxxx> writes: >>>> My strong impression is that the recoding takes place on the server. I >>>> think the bug should be reported to the gitweb maintainers unless it a >>>> local breakage on the kernel.org site. It is on server, but kernel.org runs modified version of gitweb, and the bug is in the modifications. See below. CC-ed John 'Warthog9' Hawley, maintainer of gitweb on kernel.org >>>> >>> Thanks Pavel. >>> >>> I just did a quick scan of the gitweb README - is this an issue with the >>> $mimetypes_file or $fallback_encoding configurations variables? >> >> First, what version of gitweb do you use? It should be in 'Generator' >> meta header, or (in older gitweb) in comments in HTML source at the >> top of the page. > > Not sure where I'd find the meta header, <meta name="generator" content="gitweb/1.4.5-rc0.GIT-dirty git/1.6.1.1"/> > but at the top of the HTML: > > <!-- git web interface version 1.4.5-rc0.GIT-dirty, (C) 2005-2006, Kay > Sievers <kay.sievers@xxxxxxxx>, Christian Gierke --> > <!-- git core binaries version 1.6.1.1 --> The question was if it is extremely old version of gitweb, without fix of raw blob ('blob_plain') output for non-utf8, non-text files. But the answer is that it is _modified_ version of gitweb, see below. > >> Second, the file is actually sent to browser 'as is', using binmode :raw >> (or at least should be according to my understanding of Perl). And *.bin >> binary file gets application/octet-stream mimetype, and doesn't send any >> charset info. git.kernel.org should have modern enough gitweb to use this. >> Strange... > > Dug around gitweb.perl in the main git repo. Then looked at the > git/warthog9/gitweb.git repo (after noting the Git Wiki says kernel.org > is running John Hawley's branch). > > One notable change to git_blob_plain: > > undef $/; > binmode STDOUT, ':raw'; > - print <$fd>; > + #print <$fd>; > + $output .= <$fd>; > binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi > $/ = "\n"; > > close $fd; > + > + return $output; > > If that's the code that's running, doesn't that mean the output mode > change doesn't impact the concatenation to $output? So the blob gets utf > encoding when actually printed. That is the culprit. kernel.org runs modified version of gitweb, with added caching. I guess that the above change was to have 'blob_plain' output cached... but it loses "rawness", and I guess it also loses mimetype info (unless "print $cgi->header(...)" is also changed to appending to $output). One possible solution would be to redirect STDOUT to scalar, and return that scalar; do that always when caching _output_, and print :raw all cached _output_ data. close STDOUT; open STDOUT, '>', \$output or die "Can't open STDOUT: $!"; BTW. f5aa79d (gitweb: safely output binary files for 'blob_plain' action) was my third patch for git... -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html