On Sat, 31 May 2008, Junio C Hamano wrote: > Jakub Narebski <jnareb@xxxxxxxxx> writes: > >> Always add charset info from $default_text_plain_charset (if it is >> defined) to "raw" (a=blob_plain) output for 'text/plain' blobs. >> Adding charset info in a special case was removed from blob_mimetype(). >> >> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> >> --- > > Looks Ok but it took a bit of digging on the list for me to figure out > that something like this was missing from the beginning of your commit log > message: > > Earlier "blob_plain" view sent "charset=utf-8" only when gitweb > guessed the content type to be text by reading from it, and not > when the MIME type was obtained from /etc/mime.types. > > This fixes the bug by always adding.... I'm sorry that I have forgot to put the "why" in commit message. I'd add this when resending v2 of this patch. Thanks for a comment. > But I wonder if moving of this to the calling site is the right thing to > do. Wouldn't it become much more contained and robust if you did it this > way? [...] > sub blob_mimetype { This _might_ be better. I didn't do this for the following two reasons: First, from purely theoretical point of view the name of subroutine is blob_mimetype(), and I think that charset info has place in Content-Type, but is not part of MIME type info. Second, blob_mimetype() is used in two places: in git_blob_plain (in "raw" blob view) to generate correct Content-Type HTTP header, and in git_blob to decide whether a.) blame makes sense, b.) whether to redirect to "raw" (a=blob_plain) view. I'd rather not muck with charset info in second case, although I don't think that it matters at all, at least for now. So perhaps best of those ways would be to create thin wrapper subroutine, blob_contenttype($fd, $file_name, $mimetype), where both $file_name and (especially) $mimetype are optional parameters, and ise it in git_blob_plain() subroutine... > + # Type specific postprocessing can be added as needed... > + if ($mime =~ /^text\//i && > + $mime !~ /charset=/i && $default_text_plain_charset) { > + $mime .= '; charset='.$default_text_plain_charset; > + } > + > + return $mime; I'm not sure about it. I worry a bit about text/html, which can, and usually do, contain charset info inside the document. I'm not sure what happens when charset information from HTTP headers contradict charset information from presented file. That's why I have limited adding charset info purely to 'text/plain', not 'text/*' without charset info already present. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html