On Tue, Jun 12, 2012 at 05:36:03PM +0200, Paolo Bonzini wrote: > Add a new operation mode to fallocate, called FALLOC_FL_ZERO_RANGE. > It resembles the similarly named XFS ioctl. Filesystems should > preallocate blocks for regions that span holes in the file, and convert > the entire range to unwritten extents. You've described filesystem implementation details, not a description of the functionality. The functionality is simply that after this call is made the range of the file specified will return zeros when read. You need to write the addition to the fallocate() man page, and that will help you describe what the API is supposed to do, not how a filesystem implements it.... FWIW, It is up to the filesytem to optimise this as best they can. XFS, as you've described, implements with preallocation and real->unwritten conversion. Other filesystems might simply implement it as "hole punch + preallocation", for whatever those filesystems use for that functionality (for some, preallocation means "write zeros to disk"). > This operation is a fast method > of overwriting any from the range specified with zeros without removing > any blocks or having to write zeros to disk. Well, that is the method XFS uses, but the idea is to avoid that if they can. i.e. be fast. Think about that for a moment - if the range is fragmented or sparse, it may be faster makes sense to punch out the existing extents and preallocate a single new extent in this case that to have to allocate multiple (potentially hundreds or thousands) small extents to fill holes. Just because I implemented it the easy way in XFS for the person that requested it (i.e. zeroing preallocated VM images that were already perfectly laid out) doesn't mean that is the only way it can be implemented. > Any subsequent read in the given range will return zeros until new > data is written. That should be the second sentence of the commit message. > This functionality requires filesystems to support unwritten extents. > If xfs_info(8) reports unwritten=1, then the filesystem was made to > flag unwritten extents. It is okay to report EOPNOTSUPP and let the > application deal with the outcome, but it is not okay to succeed or > report EOPNOTSUPP for the same inode depending on the other arguments. I don't think that is true, nor necessary, for the commit message - filesystems without unwritten extents can implement this quite easily just by writing zeros to the range just like some do for preallocation. > FALLOC_FL_PUNCH_HOLE|FALLOC_FL_ZERO_RANGE is ruled out here, at the > vfs level, rather than leaving it to the filesystems. This way, in the > future 0x6 could be used as a third mode. We have more than enough feature bits that we don't need to contemplate mixing various combinations to provide different features in future. Besides, a filesystem could interpret that pair as "punch the range, then preallocate it" rather than "convert to unwritten and hole fill with preallocation", so I do not see them as mutually exclusive. If the filesystem wants to treat them that way, then they are welcome to, but I can definitely see a use case for allowing them both to be set.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs