Re: Mime-type handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2005/12/9, Curt Zirzow <czirzow@xxxxxxxxx>:
>
> On Thu, Dec 08, 2005 at 12:31:52PM +0100, Manuel Vacelet wrote:
> > Hi all,
> >
> > I'm facing a bad behaviour of 'file' command used by fileinfo PECL
> module
> > (recommanded for mime-type checking):
> > * Some Microsoft Excel documents are detected as Microsoft Word
> documents
> > * Some HTML files are just text/plain
> > * ...
> >
> > I tested on multiple machines (with different version of file) and I
> > sometimes obtain a diffrent behaviour but never the one expected :/ I
> also
> > looked for the latest version of file but it seems that the file used to
> > detect the mime-type is out of date...
>
> I'm not familiar to how fileinfo detects the contents, is the file
> it is using a file like:
>
>   /usr/share/misc/magic


Yes and some other path.

> ...
> > * Where can I find an up-to-date version of magic number list usable
> with
> > file for mime type checking ?
>
> If the above is true, a updated version should be availble for the
> OS you are using.
>

Unfortunatly, my OS do not provide an up-to-date magic file.
But I have found an efficient solution via freedesktop shared mime project:
http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec

They provide an up-to-date database of magic numbers:
http://freedesktop.org/Software/shared-mime-info

And there is an PHP implementation of querying tool:
http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec#head-978bef7f41fbdc4b40c2deacb294a386c82aae4d

I tested it and it works very well. All my test case was successfully
passed.

> Even identifying contents of the file is not as reliable as one
> would think; it can also be spoofed.  For example with jpeg, there
> are several tools out there that will take a file, wrap  a jpeg
> image around the file  and embed the real contents inside of the
> file, and if your app just detects the magic contents, it will pass
> the test.
>
> The only way to ensure a file is what it really is to open and
> resave it with a trusted application. Using the jpeg example you
> would need to do something like:
>
>  djpeg $file | cjpeg > testfile.jpg
>
> Well, with jpeg, the files will always be differnt but a fuzzy match
> based on filesize closeness and/or similar bit distribution.

Well your comment is very intresting, I keep it in mind. But for my current
usage, I think the "on server" mime-type detection as described above will
secure enough.

Thanks for all,
Manuel

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux