David Chinner wrote:
Firstly, XFS attaches a different I/O completion to delalloc writes to allow us to update the file size when the write is beyond the current on disk EOF. This code cannot do that as all it does is allocation and present "normal looking" buffers to the generic code path.
how do you implement fsync(2) ? you'd have to wait such IO to complete, then update the inode and write it through the log?
Also, looking at the way mpage_da_map_blocks() is done - if we have an 128MB delalloc extent - ext4 will allocate that will allocate it in one go, right? What happens if we then crash after only writing a few megabytes of that extent? stale data exposure? XFS can allocate multiple gigabytes in a single get_blocks call so even if ext4 can't do this, it's a problem for XFS.....
I just realized that you're talking about data=ordered mode in ext4, where care is taken to prevent on-disk references to no-yet-written blocks. The solution is to wait such IO to complete before metadata commit. And the key thing here is to allocate and attach to inode blocks we're writing immediately. IOW, there is no unwritten blocks attached to inode (except fallocate(2) case), but there may be blocks preallocated for this inode in-core. same gigabytes, but different way ;) I have no single objection to custom IO completion callback per mpage_writepages(). thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html