Re: [PATCH] tmpfs: don't interrupt fallocate with EINTR

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Tue, 5 Mar 2024 15:03:53 +0100 (CET)

On Tue, 5 Mar 2024, Christian Brauner wrote:

> On Tue, Mar 05, 2024 at 10:34:26AM +0100, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 5 Mar 2024, Christian Brauner wrote:
> > 
> > > On Mon, Mar 04, 2024 at 07:43:39PM +0100, Mikulas Patocka wrote:
> > > > 
> > > > Index: linux-2.6/mm/shmem.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/mm/shmem.c	2024-01-18 19:18:31.000000000 +0100
> > > > +++ linux-2.6/mm/shmem.c	2024-03-04 19:05:25.000000000 +0100
> > > > @@ -3143,7 +3143,7 @@ static long shmem_fallocate(struct file
> > > >  		 * Good, the fallocate(2) manpage permits EINTR: we may have
> > > >  		 * been interrupted because we are using up too much memory.
> > > >  		 */
> > > > -		if (signal_pending(current))
> > > > +		if (fatal_signal_pending(current))
> > > 
> > > I think that's likely wrong and probably would cause regressions as
> > > there may be users relying on this?
> > 
> > ext4 fallocate doesn't return -EINTR. So, userspace code can't rely on it.
> 
> I'm confused what does this have to do with ext4 since this is about
> tmpfs.

You said that applications may rely on -EINTR and I said they don't 
because ext4 doesn't return -EINTR.

> Also note, that fallocate(2) documents EINTR as a valid return
> value. And fwiw, the manpage also states that "EINTR  A signal was
> caught during execution; see signal(7)." not a "fatal signal".

Yes, but how should the userspace use the fallocate call reliably? Block 
all the signals around the call to fallocate? What to do if I use some 
library that calls fallocate and retries on EINTR?

> Aside from that. If a user sends SIGUSR1 then with the code as it is now
> that fallocate call will be interrupted. With your change that SIGUSR1
> won't do anything anymore. Instead userspace would need to send SIGKILL.
> So userspace that uses SIGUSR1 will suddenly hang.

It will survive one SIGUSR, but it hangs if the signal is being sent at a 
periodic interval.

A quick search shows that people are already adding loops when fallocate 
returns EINTR. All these loops will livelock when a signal is repeatedly 
being delivered: 
https://forge.chapril.org/hardcoresushi/libgocryptfs/commit/8518d6d7bde33fdc7ef5bcb7c3c7709404392ad8?style=unified&whitespace= 
https://postgrespro.com/media/maillist-attaches/pgsql-hackers/2022/07/1/20220701154105.jjfutmngoedgiad3@alvherre.pgsql/v2-0001-retry-ftruncate.patch 
https://lists.nongnu.org/archive/html/qemu-devel/2015-02/msg01116.html

Here, Postgres developers hit the same problem with retrying (they have 
5ms timer):
https://www.postgresql.org/message-id/CA%2BhUKGKS2Radu-1Ewhe1-LEj19C-3XAQ7wnkQMb4e9E9q9ZXSg%40mail.gmail.com

Mikulas