On Mon, Nov 13, 2006 at 09:46:01AM -0800, Bryan Henderson wrote: > >Does anyone have any estimates of how much space is wasted by these > >files without making them a special case? It seems to me that most > >people have huge disks and don't really care about losing a few KB here > >and there (especially if it makes more common cases slower). > > Two thoughts: > > 1) It's not just disk capacity. Using a 4K disk block for 16 bytes of > data also wastes the time it takes to drag that 4K from disk to memory and > cache space. > > 2) Making more efficient storage and access of _existing_ sets of files > isn't usually the justification for this technology. It's enabling new > kinds of file sets. Imagine all the 16 byte files that never got created > because the designer didn't want to waste 4K on each. A file with a > million 16 byte pieces might work better with a million separate files, > but was made a single file because 64 GB of storage for 16 MB of data was > not practical. Similarly, there are files that would work better with 1 > MB blocks, but have 4K blocks anyway, because the designer couldn't afford > 1 MB for every 16 byte file. More thoughts: 1) It's not just about storage efficiency, but also about transfer efficiency. Disk drives generally like to transfer hunks of data in 16k to 64k at a time. So getting related pieces of small hunks of data read at the same time, we can win big on performance. BUT, it's extremely hard to do this at the filesystem level, since the application is much more likely to know which micro-file of 16 bytes is likely to be needed at the same time as some other micro-file which is only 16 bytes long. 2) If you have millions of separate files, each 16 bytes long, and you need to read a huge number of them, you can end up getting killed on system call overhead. I remember having this argument with Hans Reiser at one point. His argument was that parsing was evil; and should never have to be done. (And if anyone has ever seen the vast quanties of garbage which is generated when you implement an XML parser in Java, and the resulting GC overhead I can't blame them for thinking this...) So his argument was that instead of parsing a file like /etc/inetd.conf, there should be an /etc/inetd.conf.d directory, and in that directory there might be directory called telnet, and another one called ssh, and yet another called smtp, and then you might have files such as: FILENAME CONTENTS =============================================================== /etc/inetd.conf.d/telnet/port 23 /etc/inetd.conf.d/telnet/protocol tcp /etc/inetd.conf.d/telnet/flags nowait /etc/inetd.conf.d/telnet/user root /etc/inetd.conf.d/telnet/daemon /sbin/telnetd /etc/inetd.conf.d/ssh/port 22 /etc/inetd.conf.d/ssh/protocol tcp /etc/inetd.conf.d/ssh/flags nowait /etc/inetd.conf.d/ssh/user root /etc/inetd.conf.d/ssh/daemon /sbin/sshd etc. When I pointed out the system call overhead that would result since instead of an open, read, close to read /etc/inetd.conf, you would now need perhaps a hundred or more system calls do the opendir/readir loop, and then individually opening, reading, and closing each file, Hans had a solution ---- a new system call where you could download a byte coded language of commands program into the kernel, so the kernel could execute a sequence of commands and return to userspace a single buffer containing the contents of all of the files, which could then be parsed by the userspace program..... But wait a second, I thought the whole point of this complicated scheme, including implementing a byte code interpreter in the kernel with all of the attendent potential security issues, was to avoid needing to do parsing. Oops, oh well, so much for that idea. So color me skeptical that 16 byte files are really such a great design... - Ted - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html