On Thu, Nov 04, 2010 at 06:29:47PM +0000, Alex Bligh wrote: > > >Well, I would personally not be against an extension to fallocate() > >where if the caller of the syscall specifies a new flag, that might be > >named FALLOC_FL_EXPOSE_OLD_DATA, and if the caller either has root > >privs or (if capabilities are enabled) CAP_DAC_OVERRIDE && > >CAP_MAC_OVERRIDE, it would be able to allocate blocks whose extents > >would be marked as initialized without actually initializing the > >blocks first. > > That sounds a lot like "send patches" which I just might do, if only > to gain better understanding as to what is going on. Patches to do this wouldn't be that hard. The harder part would probably be the politics on fs-devel regarding the semantics of FALLOC_FL_EXPOSE_OLD_DATA. > I seem to remember (from lwn's summary of lkml) that the proposed > options for fallocate() got a bit baroque to start with, and people > then simplified down to zero options. Perhaps that was a simplification > too far. It was simplified down to one flag. But that means we have a flags field we can use to extend fallocate. > In the mean time, particularly as I'd ideally like to avoid a kernel > modification, is there a safe way I could use or modify the ext2 > library to run through the extents of a fallocated() file and clear > the "unwritten" bit? If I clear that (which from memory is the top > bit of the extent length), is that alone safe? (on an unmounted > file system, obviously). Yes, there most certainly is. The functions you'd probably want to use are ext2fs_extent_open(), and then either use ext2fs_extent_goto() to go to a specific extent, use ext2fs_extent_get() with the EXT2_EXTENT_NEXT operation to iterate over the extents, and then use ext2fs_extent_replace() to mutate the extent. Oh, and then use ext2fs_extent_close() when you're done looking at and/or changing the extents of a file. If you build tst_extents in lib/ext2fs, you can use commands like "inode" (to open the extents for a particular inode), and "root", "current", "next", "prev", "next_leaf", "prev_leaf", "next_sibling", "prev_sibling", "delete_node", "insert_node", "replace_node", "split_node", "print_all", "goto", etc. Please don't use this in production, but it's not a bad way to play with an extent tree, either for learning purposes or to create test cases. tst_extents.c is also a good way of seeing how the various libext2fs extent API's work. > I would tend to agree that replicating across commodity disks is > in almost all cases a better technological solution, but the > technology is still further away from readiness there. Sadly > technological arguments don't always win the day, and we need > something in the mean time... Well, things like Hadoopfs exist today, and Ceph (if you need a POSIX-level access) is admittedly less stable. But if you're starting from scratch, wouldn't that be pretty far away from readiness as well? - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users