2005/12/9, Curt Zirzow <czirzow@xxxxxxxxx>: > > On Thu, Dec 08, 2005 at 12:31:52PM +0100, Manuel Vacelet wrote: > > Hi all, > > > > I'm facing a bad behaviour of 'file' command used by fileinfo PECL > module > > (recommanded for mime-type checking): > > * Some Microsoft Excel documents are detected as Microsoft Word > documents > > * Some HTML files are just text/plain > > * ... > > > > I tested on multiple machines (with different version of file) and I > > sometimes obtain a diffrent behaviour but never the one expected :/ I > also > > looked for the latest version of file but it seems that the file used to > > detect the mime-type is out of date... > > I'm not familiar to how fileinfo detects the contents, is the file > it is using a file like: > > /usr/share/misc/magic Yes and some other path. > ... > > * Where can I find an up-to-date version of magic number list usable > with > > file for mime type checking ? > > If the above is true, a updated version should be availble for the > OS you are using. > Unfortunatly, my OS do not provide an up-to-date magic file. But I have found an efficient solution via freedesktop shared mime project: http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec They provide an up-to-date database of magic numbers: http://freedesktop.org/Software/shared-mime-info And there is an PHP implementation of querying tool: http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec#head-978bef7f41fbdc4b40c2deacb294a386c82aae4d I tested it and it works very well. All my test case was successfully passed. > Even identifying contents of the file is not as reliable as one > would think; it can also be spoofed. For example with jpeg, there > are several tools out there that will take a file, wrap a jpeg > image around the file and embed the real contents inside of the > file, and if your app just detects the magic contents, it will pass > the test. > > The only way to ensure a file is what it really is to open and > resave it with a trusted application. Using the jpeg example you > would need to do something like: > > djpeg $file | cjpeg > testfile.jpg > > Well, with jpeg, the files will always be differnt but a fuzzy match > based on filesize closeness and/or similar bit distribution. Well your comment is very intresting, I keep it in mind. But for my current usage, I think the "on server" mime-type detection as described above will secure enough. Thanks for all, Manuel