On Fri, Apr 15, 2011 at 04:28:37PM -0600, Andreas Dilger wrote: > On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote: > > On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote: > >> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE > >> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier > >> to define and easier to use) interface for discovering the regions of a > >> file that only contain NUL bytes? > > > > Yes, I've already suggested that both in this thread and on IRC. > > > > For efficient copies it's the only usable interface. > > I suspect that these bugs would have still existed whether the > interface is SEEK_HOLE/SEEK_DATA, or FIEMAP. The main problem is > that the delalloc pages were not accounted for correctly during > layout traversal.. It's not delalloc that is the problem - XFS accounts for them just fine in the extent map when asked. However, XFS does speculative delayed allocation over regions that contain no data, so if the core-utils folk are assuming that delalloc extents contain data and need to be copied, they're in for a nasty surprise. However, every example I've seen in this thread has had to do with unwritten extents not changing state when data is written into the page cache. i.e. people are struggling with the expected behaviour of unwritten extents. That is, unwritten extent remain unwritten extents until data has been _physically_ written to them. If there is data in the page cache over the unwritten extent, it is still an unwritten extent. If the system crashes while in this state, then the extent _must_ remain an unwritten extent after recovery, otherwise it exposes stale data. Further, using FIEMAP to determine where the data is that needs copying is extremely fragile. What happens when FIEMAP grows a different type of extent that contains data? cp will break, because it doesn't think it needs to copy data in extents of an unknown type. Or it will break because it thinks it needs to copy it and there's something in it that should not be copied. Also, cp shoul dnot be trying to replicate the physical layout of the file when copying it - that's for the filesystem to decide and having userspace try to do this is a sure recipe for causing severe filesystem fragmentation. The filesystems already do an excellent job of optimising allocation - userspace should not be trying to second guess what is optimal layout for the filesystem. Fundamentally, what the core-utils guys want is FIEMAP to tell them where data is in the file, regardless of whether it is in memory or on disk. That is not what FIEMAP is intended for and matches SEEK_HOLE/SEEK_DATA precisely. SEEK_HOLE/SEEK_DATA have very well understood semantics and is designed specifically for optimising acceess to sparse files. This interface abstracts all the details of how different filesystems store their data so the application doesn't need to care about it. The API is so, so much simpler to use and understand, to. And if the filesystem has data in cache over an unwritten extent, then by definition it's still data to be returned by SEEK_DATA. If it fails to return the range as such then the implementation is broken. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html