Ted, --On 4 November 2010 12:16:13 -0400 Ted Ts'o <tytso@xxxxxxx> wrote:
Well, I would personally not be against an extension to fallocate() where if the caller of the syscall specifies a new flag, that might be named FALLOC_FL_EXPOSE_OLD_DATA, and if the caller either has root privs or (if capabilities are enabled) CAP_DAC_OVERRIDE && CAP_MAC_OVERRIDE, it would be able to allocate blocks whose extents would be marked as initialized without actually initializing the blocks first.
That sounds a lot like "send patches" which I just might do, if only to gain better understanding as to what is going on. I seem to remember (from lwn's summary of lkml) that the proposed options for fallocate() got a bit baroque to start with, and people then simplified down to zero options. Perhaps that was a simplification too far. In the mean time, particularly as I'd ideally like to avoid a kernel modification, is there a safe way I could use or modify the ext2 library to run through the extents of a fallocated() file and clear the "unwritten" bit? If I clear that (which from memory is the top bit of the extent length), is that alone safe? (on an unmounted file system, obviously).
You do realize, though, that it sounds like with your design you are replicating the servers, but not the disk devices --- so if your disk device explodes, you're Sadly Out of Luck. Sure you can use super-expensive storage arrays, but if you're writing your own cluster file system, why not create a design which uses commodity disks and worry about replicating data across servers at the cluster file system level?
The particular use case here is for customers that have sunk huge amounts of money into expensive storage arrays, or for whatever reason have an aversion to storing anything on anything other than expensive storage arrays. I would tend to agree that replicating across commodity disks is in almost all cases a better technological solution, but the technology is still further away from readiness there. Sadly technological arguments don't always win the day, and we need something in the mean time... -- Alex Bligh _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users