Re: posix_fallocate behavior in glibc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 30, 2024 at 07:03:50PM +0200, Florian Weimer wrote:
> 
> At the very least, we should have a variant of ftruncate that never
> truncates, likely under the fallocate umbrella.  It seems that that's
> how posix_fallocate is used sometimes, for avoiding SIGBUS with mmap.
> To these use cases, whether extents are allocated or not does not
> matter.

Personally, what I advise any application authors I come across is
simply tell them to avoid using posix_fallocate(2) altogether; the
semantics are totally broken, as is common with anything mandated by a
committee that was trying to satify multiple legacy Unix
implementations.  And so, relying on it just going to be fraught.

What I tell them to do instead is to use the Linux fallocate(2) system
call directly, which is well-defined, and if the file system doesn't
support fallocate, and fallocate(2) returns ENOSPC, that the userspace
application should either accept the fact it won't be able to allocate
the space, or if it really needs to avoid things like the SIGBUS with
mmap(2), to have the userspace application do the zero-fill writes
itself.

So honestly, is it worth it to try "fixing" posix_fallocate(2)?  Just
tell people to avoid it like the plague....  That way, we don't have
to worry about breaking existing legacy applications.

If we are going to stick with the existing Linux fallocate(2) system
call, then the problem is trying to have the system mind-read about
what the application writer really was trying to get when they call
fallocate(2) --- are they trying to avoid SIGBUS with mmap?  Or are
they trying to guarantee that any writes to that file range will never
fail with ENOSPC (even in the face of something like dm-thin being in
the storage stack).  And so the solution is simple; we can define new
flag bits to the fallocate(2) system call to make it be explicit
exactly what the application is requesting of the system.

Adding new fallocate(2) flag bits seems to be a more general solution
adding a new ftruncate(2) variant,

In addition, we can also add a new flag which requests the file system
passes the allocation request down to the thin provisioned storage
(aassuming that this is something that is supported).  Although I'm
not sure how much this matters; after all, for decades there have been
thin-provisioned NetApp storage appliances where fallocate(2) or
posix_falloate(2) wouldn't necessarily guarantee a thin-provisioned
device might run out of space on a write(2), and application authors
seem to have been willing to live with it.  Still, if people really
want this to work, even in the face of a file system which supports
copy-on-write cloned ranges, then presumably this new fallocate(2)
system call with the "never shall a write fail with ENOSPC" bit set,
can also snap the COW region as well.  It's important, though, that
this be done usinga new fallocate(2) flag, as opposed to have this
magically be added to the existing fallocate(2) system call, since
that will likely cause surprises for some applications.

     	  	       		     - Ted




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux