Re: [PATCH] gitweb.cgi: Use File::MMagic; "a=blob" action knows the blob/file type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



--- Junio C Hamano <junkio@xxxxxxx> wrote:
> Luben Tuikov <ltuikov@xxxxxxxxx> writes:
> 
> > Use File::MMagic to determine the MIME type of a blob/file.
> > The variable magic_mime_file holds the location of the
> > "magic.mime" file, usually "/usr/share/file/magic.mime".
> > If not defined, the magic numbers internally stored in the
> > File::MMagic module are used.
> 
> I am sorry to ask you this, but would you mind redoing this
> patch without File::MMagic bits?

Yeah, no problem, will do.

> I think giving "a=blob" an
> ability to automatically switch to git_blob_plain is a good
> addition (as is your earlier patch to give a direct link to
> reach blob_plain from the list), so let's have that part in
> first.

Ok.

> I haven't applied your earlier one but it will appear in
> "next" shortly.

Ok.

> Existing filename based mimetypes_guess should be a lot cheaper
> than exploding a blob and feeding it to File::MMagic.  I was
> hoping File::MMagic to be used when we cannot guess the content
> type that way (i.e. when mimetypes_guess returns undef or
> application/octet-stream).

The MIME guessing is used in git_blob_plain(), where the blob
is already exploded, since a file descriptor is passed to the MIME
guessing routines.

Initially I tried using the already opened file descriptor to
the blob, but File::MMagic v1.27 always returns "text/plain"
on that.  I was astounded by this and a 5 line perl script
using that same module confirmed my suspicion that the module
or "seek" is broken in perl or in FileHandle.  I.e.
checktype_filename() succeeds, but checktype_filehandle()
always returns "text/plain".

Had simply reading the file descriptor succeeded, then we'd not
need to explode blobs in $git_temp.

The other reason I decided to completely go with File::MMagic is
that if I'm going to use it to decide the mime type after
the default method didn't succeed, why not forget the default
method and just use File::MMagic in general -- seemed like a logical
simplification.

> Since the repository owner can correct misidentification by the
> standard /etc/mime.types by supplying a custom per-repository
> $mimetypes_file (modulo that the current implementation of
> mimetype_guess_file does not allow it if the file does not have
> an extension that is specific enough), File::MMagic might be an
> overkill, especially if used in the way this patch does.  To
> allow finer grained differentiation that cannot be done with
> file extensions alone (e.g. some files may have .dat extension
> but one can be VCD mpeg wrapped in RIFF, and another can be a
> Z-machine story file), it might be simpler to allow the
> repository owner to specify full $file_name for such an ambiguous
> file in their custom $mimetypes_file, and try to match it in
> mimetype_guess_file sub.  That way we may not even need to use
> File::MMagic.

This is true, but I wonder how many people are going to go and
create their own mime.type files.  Most of the time people would
do minimal gitweb.cgi changes not even specifying $mimetypes_file
nor even $magic_mime_file.

> Are there cases where only $hash is given without $file_name?
> If so we may need to fall back on File::MMagic in such a case
> after all, but get_blob_mimetype sub copies the whole blob to a
> temporary file to work around a problem with version 1.27 you
> state in the comment -- this is way too much (and nobody seems
> to clean up the tempfile).  Looking at magic.mime, I suspect we
> might be able to get away with the first 4k bytes or so at most
> (the largest offset except iso9660 image is "Biff5" appearing at
> 2114 to signal an Excel spreadsheet).

I was hoping that tmpwatch(8) would clean up the blobs in Linux/UNIX.
But we can also delete them after the lookup.  Anothing thing is that
getting a blob isn't that often and when it happens, after the MIME
lookup it is already in the pagecache.

    Luben

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]