On Sat, Nov 26, 2011 at 10:14:55PM -0500, Ted Ts'o wrote: > On Fri, Nov 25, 2011 at 05:40:50AM -0500, Christoph Hellwig wrote: > > On Fri, Nov 25, 2011 at 10:26:09AM +0000, P??draig Brady wrote: > > > I was wondering about adding fallocate() to cp, > > > where one of the benefits would be immediate indication of ENOSPC. > > > I'm now wondering though might fallocate() fail to allocate an > > > extent with ENOSPC, but there could be fragmented space available to write()? > > > > fallocate isn't guaranteed to allocate a single or even contiguous > > extents, it just allocate the given amount of space, and if the fs isn't > > too fragmented and the allocator not braindead it will be sufficiently > > contiguous. Also all Linux implementation may actually still fail a write > > later if extreme corner cases when btree splits or other metadata > > operations during unwritten extent conversions go over the space limit. > > While this is true, *usually* fallocate will allocate enough space, > but as Cirstoph has said, you still have to check the error returns > for the write(2) and close(2) system call, and deal appropriately with > any errors. > > The other reason to use fallocate is if you are copying a huge number > of files, it's possible you'll get better block allocation layout, > depending on the file system, and how insane the writeback code for a > particular kernel version might be. (Some versions of the kernel had > writeback algorithms that would write 4MB of one file, then 4MB for > another file, then 4MB for yet another file, then 4MB of the first > file, etc. --- and some file systems can deal with this kind of write > pattern better than others.) Right, but.... > Using fallocate if you know the size of > the file up front won't hurt, and on some systems it might help. ... this is - as a generalisation - wrong. Up front fallocate() can and does hurt performance, even when you know the size of the file ahead of time. Why? Because it defeats the primary, seek reducing writeback optimisation that filesystems have these days: delayed allocation. This has been mentioned before in previous threads where you've been considering adding fallocate to cp. e.g: http://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg10819.html fallocate() style (or non-delalloc, write syscall time) allocation leads to non-optimal file layouts and slower writeback because the location that blocks are allocated in no way matches the writeback pattern, hence causing an increase in seeks during writeback of large numbers of files. Further, filesytsems that are alignment aware (e.g. XFS) will align every fallocate() based allocation, greatly fragmenting free space when used on small files and the filesystem is on a RAID array. However, in XFS, delayed allocation will actually pack the allocation across files tightly on disk, resulting in full stripe writes (even for sub-stripe unit/width files) during writeback. Delayed allocation allows workloads such as cp to run as a bandwidth bound operation because allocation is optimised to cause sequential write IO, whereas up-front fallocate() causes it to run as an IOPS bound option because file layout does not match the writeback pattern. And on large, high performance RAID arrays, bandwidth capacity is much, much higher than IOPS capacity, so delayed allocation is going to be far faster and have less long term impact on the filesystem than using fallocate. IOWs, use of fallocate() -by default- will speed filesystem aging because it removes the benefits delayed allocation has on both short and long term filesystem performance. The three major Linux filesystems (XFS, BTRFS and ext4) use delayed allocation, and hence do not need fallocate() to be used by userspace utilities like cp, tar, etc. to avoid fragmentation. If a given filesystem is still prone to fragmentation of data extents when copying data via cp or tar, then that is a problem with the filesystem that needs to be fixed, not worked around in the userspace utilities in a manner that is detrimental to other filesystems that don't suffer from those problems... Yes, fallocate can help reduce fragmentation and increase performance in some situations, so making it an -option- for people who know what they are doing is a good idea. However, it should not be made the default for all of the reasons above. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html