Re: fallocate vs ENOSPC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 26, 2011 at 10:14:55PM -0500, Ted Ts'o wrote:
> On Fri, Nov 25, 2011 at 05:40:50AM -0500, Christoph Hellwig wrote:
> > On Fri, Nov 25, 2011 at 10:26:09AM +0000, P??draig Brady wrote:
> > > I was wondering about adding fallocate() to cp,
> > > where one of the benefits would be immediate indication of ENOSPC.
> > > I'm now wondering though might fallocate() fail to allocate an
> > > extent with ENOSPC, but there could be fragmented space available to write()?
> > 
> > fallocate isn't guaranteed to allocate a single or even contiguous
> > extents, it just allocate the given amount of space, and if the fs isn't
> > too fragmented and the allocator not braindead it will be sufficiently
> > contiguous.  Also all Linux implementation may actually still fail a write
> > later if extreme corner cases when btree splits or other metadata
> > operations during unwritten extent conversions go over the space limit.
> 
> While this is true, *usually* fallocate will allocate enough space,
> but as Cirstoph has said, you still have to check the error returns
> for the write(2) and close(2) system call, and deal appropriately with
> any errors.
> 
> The other reason to use fallocate is if you are copying a huge number
> of files, it's possible you'll get better block allocation layout,
> depending on the file system, and how insane the writeback code for a
> particular kernel version might be.  (Some versions of the kernel had
> writeback algorithms that would write 4MB of one file, then 4MB for
> another file, then 4MB for yet another file, then 4MB of the first
> file, etc. --- and some file systems can deal with this kind of write
> pattern better than others.)

Right, but....

> Using fallocate if you know the size of
> the file up front won't hurt, and on some systems it might help.

... this is - as a generalisation - wrong. Up front fallocate() can
and does hurt performance, even when you know the size of the file
ahead of time.

Why? Because it defeats the primary, seek reducing writeback
optimisation that filesystems have these days: delayed allocation.
This has been mentioned before in previous threads where you've been
considering adding fallocate to cp. e.g:

http://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg10819.html

fallocate() style (or non-delalloc, write syscall time) allocation
leads to non-optimal file layouts and slower writeback because the
location that blocks are allocated in no way matches the writeback
pattern, hence causing an increase in seeks during writeback of
large numbers of files.

Further, filesytsems that are alignment aware (e.g. XFS) will align
every fallocate() based allocation, greatly fragmenting free space
when used on small files and the filesystem is on a RAID array.
However, in XFS, delayed allocation will actually pack the
allocation across files tightly on disk, resulting in full stripe
writes (even for sub-stripe unit/width files) during writeback.

Delayed allocation allows workloads such as cp to run as a bandwidth
bound operation because allocation is optimised to cause sequential
write IO, whereas up-front fallocate() causes it to run as an IOPS
bound option because file layout does not match the writeback
pattern. And on large, high performance RAID arrays, bandwidth
capacity is much, much higher than IOPS capacity, so delayed
allocation is going to be far faster and have less long term impact
on the filesystem than using fallocate.

IOWs, use of fallocate() -by default- will speed filesystem aging
because it removes the benefits delayed allocation has on both short
and long term filesystem performance.

The three major Linux filesystems (XFS, BTRFS and ext4) use delayed
allocation, and hence do not need fallocate() to be used by
userspace utilities like cp, tar, etc. to avoid fragmentation. If a
given filesystem is still prone to fragmentation of data extents
when copying data via cp or tar, then that is a problem with the
filesystem that needs to be fixed, not worked around in the
userspace utilities in a manner that is detrimental to other
filesystems that don't suffer from those problems...

Yes, fallocate can help reduce fragmentation and increase
performance in some situations, so making it an -option- for people
who know what they are doing is a good idea. However, it should not
be made the default for all of the reasons above.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux