On Thu, 20 Jan 2011 14:12:33 +0800 Shaohua Li <shaohua.li@xxxxxxxxx> wrote: > On Thu, 2011-01-20 at 13:55 +0800, Andrew Morton wrote: > > On Thu, 20 Jan 2011 13:38:18 +0800 Shaohua Li <shaohua.li@xxxxxxxxx> wrote: > > > > > > ext2, minix and probably others create an address_space for each > > > > directory. Heaven knows what xfs does (for example). > > > yes, this is for one directiory, but the all files's metadata are in > > > block_dev address_space. > > > I thought you mean there are several block_dev address_space like > > > address_space in some filesystems, which doesn't fit well in my > > > implementation. for ext like filesystem, there is only one > > > address_space. for filesystems with several address_space, my proposal > > > is map them to a virtual big address_space in the new ioctls. > > > > ext2 and minixfs (and I think sysv and ufs) have a separate > > address_space for each directory. I don't see how those can be > > represented with a single "virtual big address_space" - we also need > > identifiers in there so each directory's address_space can be created > > and appropriately populated. > Oh, I misunderstand your comments. you are right, the ioctl methods > don't work for ext2. the dir's address_space can't be readahead either. > Looks we could only do the metadata readahead in filesystem specific > way. Another way of doing all this would be to implement some sort of lookaside cache at the vfs->block boundary. At boot time, load that cache up with all the disk blocks which we know the boot will need (a single ascending pass across the disk) and then when the vfs/fs goes to read a disk block take a peek in that cache first and if it's a hit, either steal the page or memcpy it. It has the obvious coherence problems which would be pretty simple to solve by hooking into the block write path as well. The list of needed blocks can be very simply generated with existing blktrace infrastructure. It does add permanent runtime overhead - once the cache is invalidated and disabled, every IO operation would incur a test-n-not-taken-branch. Maybe not too bad. Need to handle small-memory systems somehow, where the cache simply ooms the machine or becomes ineffective because it's causing eviction elsewhere. It could perhaps all be implemented as an md or dm driver. Or even as an IO scheduler. I say this because IO schedulers can be replaced on-the-fly, so the caching layer can be unloaded from the stack once it is finished with. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html