On Tue, 8 May 2007 22:56:09 -0700, Valerie Henson wrote: > > I like it too, especially the rmap stuff, but I don't think it solves > some of the problems chunkfs solves. The really nice thing about > chunkfs is that it tries hard to isolate each chunk from all the other > chunks. You can think of regular file systems as an OS one big shared > address space - any process can potentially modify any other process's > address space, including the kernel's - and chunkfs as the modern UNIX > private address space model. Except in rare worst case models (the > equivalent of a kernel bug or writing /dev/mem), the only way one > chunk can affect another chunk is through the narrow little interface > of the continuation inode. This severely limits the ability of one > chunk to corrupt another - the worse you can do is end up with the > wrong link count on an inode pointed to from another chunk. This leaves me a bit confused. Imo a filesystem equivalent of process's address spaces would be permissions and quotas. Indeed there is no guarantee where any address spaces pages may physically reside. They can be in any zone, node or even swap or regular files. Otoh, each physical page does have an rmap of some sorts - enough to figure out why currently owns this page. Does your own analogy work against you? Back to chunkfs, the really smart idea behind it imo is to take just a small part of the filesystem, assume that everything else is flawless, and check the small part under that assumption. The assumption may be wrong. If that wrongness would effect the minimal fsck, it should get detected as well. Otherwise it doesn't matter right now. What I never liked about chunkfs were two things. First it splits the filesystem into an array of chunks. With sufficiently large devices, either the number or the size of chunks will come close to problematic again. Some sort of tree arrangement intuitively makes more sense. Secondly, the cnodes are... weird, complicated, not well understood, a hack. Pick a term. Avoiding cnodes is harder than avoiding regular fragmentation and the recent defragment patches seem to imply we're doing a bad job at that already. Linked lists of cnodes - yuck. Not directly a chunkfs problem, but still unfortunate is that it still cannot detect medium errors. There are no checksums. Checksums cost performance, so they obviously have to be optional at user's choice. But not even having the option is quite 80's. Matt's proposal is an alternative solution that can address all of my concerns. Instead of cnodes it has the rmap. That is a very simple structure I can explain to my nephews. It allows for checksums, which is nice as well. And it does allow for a tree structure of tiles. Tree structure means that each tile can have free space counters. A supertile (or whatever one may call it) can have a free space counter that is the sum of all member free space counters. And so forth upwards. Same for dirty bits and anything else I've forgotten. So individual tiles can be significantly smaller than chunks in chunkfs. Checking them is significantly faster than checking a chunk. There will be more dirty tiles at any given time, but a better way to look at it is to say that for any dirty chunk in chunkfs, tilefs has some dirty and some clean tiles. So the overall ratio of dirty space is never higher and almost always lower. Overall I almost envy Matt for having this idea. In hindsight it should have been obvious to me. But then again, in hindsight the fsck problem and using divide and conquer should have been obvious to everyone and iirc you were the only one who seriously persued the idea and got all this frenzy started. :) Jörn -- Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. -- M.A. Jackson - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html