Hello, I've spent some time today thinking about the costs and benefits of the two approaches used today in GNOME to determine the MIME Type of a file. Also, I've done some experiments and tweakings to check the impact that file sniffing has over nautilus performance. It's impressive. See below. Some comparison: DETECTION BY SUFFIX 1.Allows wrong results on invalid input (files with wrong suffixes) 2.Fails to determine type of files without suffixes, eg. README, COPYING, mbox 3.Very fast 4.Easily customizable by users (eg. add new file types) 5.Generates low disk IO 6.Code is simple and lightweight. No extension by code is necessary DETECTION BY SNIFFING 1.Allows wrong results on _valid_ input (files with correct suffix and funny contents) 2.Allows detection of file type regardless of suffix 3.Very slow to be used massively 4.Unlikely to be customized by users 5.Too much disk IO, since it needs to open the file 6.Code is complicated. As an example, GNOME-VFS includes MP3 detection code. Not lightweight. Currently, these two approaches are combined in the directory listing of Nautilus in a bit unclear manner. There seems to be some priority mechanism to decide wether the type of a file will be decided by content or suffix. However, the content is always read and tested. Additionaly, there are some proposals of implementing some kind of fallback, to test the contents of the file only when not able to determine by suffix. IMO, we could think a bit more and combine these two approaches in a way very different from simply doing the two things when reading the directory. Today I made some tests to check the impact of sniffing in GNOME/Nautilus performance and I must confess I am very impressed. I installed GARNOME and modified GNOME VFS 2.5.3 to disable sniffing. I have some directories with thousands of 1~2MB files, so I was able to measure the time that nautilus takes to show these folders with and without sniffing. It's a simple test. You can do it yourself in minutes. After researching a bit and understanding how the VFS system works, modified the "modules/file-method.c" file, line 562: I changed: mime_type = gnome_vfs_get_file_mime_type (full_name, stat_buffer, (options & GNOME_VFS_FILE_INFO_FORCE_FAST_MIME_TYPE) != 0); To: mime_type = gnome_vfs_get_file_mime_type (full_name, stat_buffer, TRUE); According to "libgnomevfs/gnome-vfs-mime.c", the syntax of this function is: gnome_vfs_get_file_mime_type (const char *path, const struct stat *optional_stat_info, gboolean suffix_only) This is *obviously not* the solution, but I changed it this way to make sure that Nautilus would never do sniffing while I was testing. My simple testbed was this folder: /home/fabiofb/emu/smd/roms: This directory has 252 files varying from 512K to 2MB. The average is 1MB. There are ZIP, binary (unknown to nautilus magic) and text files. Using a simple chronometer, I tested Nautilus 2.5.3 with and without sniffing. I rebooted the machine between the tests to ensure that the disk cache does not mess with the results. Also, I tested multiple times each. The precision sucks because I must press the cronometer button manually, but with such a difference, no one cares about precision: with sniffing : 21 seconds without sniffing : less than one second I had similar difference with many folders of my machine, including /lib, /usr/lib, etc. My computer is a Duron-950 with 256 MB of RAM and a quite fast IDE hard disk. Given the performance bottleneck imposed by sniffing, I suggest that it is not used anymore in directory listing routines. It should be used when the user tries to open an unknown file. Let's imagine this case: - When listing a directory, the system cannot detect the MIME Type of 'my-spreadsheet' by its suffix, so the file gets "application/octet-stream". We could exploit the fact that unknown files have this MIME type by associating some file type detection utility to them. Let's call it gnomemagic. - The user double-clicks the file. "application/octet-stream" is associated with 'gnomemagic'. So gnomemagic is run and displays a dialog such as: ------------------------------------------------------- Unknown File Type The system was unable to determine the type of this file by its name. Analysing its contents, it looks like a file of type "Gnumeric Spreadsheet" (application/x-gnumeric). What do you want to do? [ ] Rename the file, appending the ".gnumeric" suffix to match its type [x] Open the file with [Gnumeric_______][v] (dropdown/combobox) [ ] Configure the system to always open unknown files that look like "Gnumeric Spreadsheet" using this application [ ] Configure the system to always open unknown files with the most probable associated application, when one is available [ CANCEL ] [ OK ] ------------------------------------------------------- Note: [ ] = checkbox 'gnomemagic' could be a separate GNOME package. This could ease the maintainabilty of the database, allowing user contributions worldwide. We could provide a website to allow users post magic for new file types. Such magic should be submited to testing and certification through some guidelines. One cool thing about 'gnomemagic' is that it could be run by applications after unsucessfully trying to open invalid, corrupted or unknown files. This entire approach would allow GNOME-VFS to forget about sniffing, making the life of maintainers easier, improving performance and eliminating (most) unexpected results. If this idea makes some sense, we can start a more ellaborate study. I would be glad to participate. Now I am going to my girlfriend's house, where her mother is preparing endless food. :-) Thanks for your attention. -- Fabio Gomes de Souza <fabio@xxxxxxxxxx> (+55 81 9127-0597) .- GS2 TECNOLOGIA DA INFORMACAO LTDA :: www.gs2.com.br |- IT Infrastructure :: Security :: Embedded systems :: Linux `- Olinda, Brazil - +55 81 3492-7777 - negocios@xxxxxxxxxx _______________________________________________ gnome-list mailing list gnome-list@xxxxxxxxx http://mail.gnome.org/mailman/listinfo/gnome-list