On Fri, Jul 24, 2009 at 11:18:38AM -0700, Curt Wohlgemuth wrote: > >> > >> But again, the extent conversion (and mark_inode_dirty()) happens at > >> get_block time, before the data goes to disk. > >> > >> For KEEP_SIZE, this isn't an exposure because i_size prevents the data > >> from being read. But without KEEP_SIZE, this would seem to be a > >> problem, right? > >> > >> (From a practical perspective, there's also a problem getting real DIO > >> to work without KEEP_SIZE in the fallocate(): the decision to send > >> "create=0" to ext4_get_block() happens in VFS code, and there's no way > >> to tell in the get_block path that "this is a 'no create' for a write, > >> vs. a read.) > > > > What we need is to track I/O's untill they hit the disk. This will > > help us to do data=guarded and also help in the above case. So > > for directIO we should use blockdev_direct_IO_own_locking and the get_block > > used should split the uninit extent the needed way but still mark it > > uninit. That would make sure a read will see the uninit extent and return > > zero as expected. Now on IO completion we should mark split uninit extent > > as init. > > I can see how using DIO_OWN_LOCKING would allow a write to send > "create=1" to ext4_get_block(). That would be cool. > > Are you then saying that we would need to postpone the > ext4_ext_convert_to_initialized() call in ext4_ext_get_blocks(), and > then have ext4_direct_IO() do this conversion on return from > blockdev_direct_IO_own_locking()? That would seem to be required... > We still need to do split of uninit extent. Only marking the new exetnt as init should be postponed. We need to split the uninit extent to actually copy the user space data to blocks. -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html