Re: fallocate vs ENOSPC

Pádraig Brady <P@xxxxxxxxxxxxxx> · Tue, 29 Nov 2011 14:11:48 +0000

On 11/29/2011 12:24 AM, Dave Chinner wrote:
> On Mon, Nov 28, 2011 at 08:55:02AM +0000, Pádraig Brady wrote:
>> On 11/28/2011 05:10 AM, Dave Chinner wrote:
>>> Quite frankly, if system utilities like cp and tar start to abuse
>>> fallocate() by default so they can get "upfront ENOSPC detection",
>>> then I will seriously consider making XFS use delayed allocation for
>>> fallocate rather than unwritten extents so we don't lose the past 15
>>> years worth of IO and aging optimisations that delayed allocation
>>> provides us with....
>>
>> For the record I was considering fallocate() for these reasons.
>>
>>   1. Improved file layout for subsequent access
>>   2. Immediate indication of ENOSPC
>>   3. Efficient writing of NUL portions
>>
>> You lucidly detailed issues with 1. which I suppose could be somewhat
>> mitigated by not fallocating < say 1MB, though I suppose file systems
>> could be smarter here and not preallocate small chunks (or when
>> otherwise not appropriate).
> 
> When you consider that some high end filesystem deployments have alignment
> characteristics over 50MB (e.g. so each uncompressed 4k resolution
> video frame is located on a different set of non-overlapping disks),
> arbitrary "don't fallocate below this amount" heuristics will always
> have unforseen failure cases...

So about this alignment policy, I don't understand the issues so I'm guessing here.
You say delalloc packs files, while fallocate() will align on XFS according to
the stripe config. Is that assuming that when writing lots of files, that they
will be more likely to be read together, rather than independently.
That's a big assumption if true. Also the converse is a big assumption, that
fallocate() should be aligned, as that's more likely to be read independently.

> In short: leave optimising general allocation strategies to the
> filesytems and their developers - there is no One True Solution for
> optimal file layout in a given filesystem, let alone across
> different filesytems. In fact, I don't even want to think about the
> mess fallocate() on everything would make of btrfs because of it's
> COW structure - it seems to me to guarantee worse fragmentation than
> using delayed allocation...
> 
>> We can already get ENOSPC from a write()
>> after an fallocate() in certain edge cases, so it would probably make
>> sense to expand those cases.
> 
> fallocate is for preallocation, not for ENOSPC detection. If you
> want efficient and effective ENOSPC detection before writing
> anything, then you really want a space -reservation- extension to
> fallocate. Filesystems that use delayed allocation already have a
> space reservation subsystem - it how they account for space that is
> reserved by delayed allocation prior to the real allocation being
> done. IMO, allowing userspace some level of access to those
> reservations would be more appropriate for early detection of ENOSPC
> than using preallocation for everything...

Fair enough, so fallocate() would be a superset of reserve(),
though I'm having a hard time thinking of why one might ever need to
fallocate() then.

> As to efficient writing of NULL ranges - that's what sparse files
> are for - you do not need to write or even preallocate NULL ranges
> when copying files. Indeed, the most efficient way of dealing with
> NULL ranges is to punch a hole and let the filesystem deal with
> it.....

well not for `cp --sparse=never` which might be used
so that processing of the copy will not result in ENOSPC.

I'm also linking here to a related discussion.
http://oss.sgi.com/archives/xfs/2011-06/msg00064.html

Note also that the gold linker does fallocate() on output files by default.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html