Re: Do we need to unrevert "fs: do not prefault sys_write() user buffer pages"?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 22 Jun 2021 08:53:05 -0700

On Tue, Jun 22, 2021 at 8:32 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> But yes, it could get unmapped again before the actual copy happens
> with the lock held. But that's why the copy is using that atomic
> version, so if that happens, we'll end up repeating.

Side note: search for "iov_iter_fault_in_writeable()" on lkml for a
gfs2 patch-series that is buggy, exactly because it does *not* use the
atomic user space accesses, and just tries to do the fault-in to hide
the real bug.

So you are correct that the fault-in is something people need to be
very wary of. Without the atomic side of the access, it's pure voodoo
programming.

You have two choices:

 - don't hold any filesystem locks (*) over a user space access

 - do the user space access with the atomic versions and repeat (with
pre-faulting to make the repeat work)

There's one special case of that "no filesystem locks" case that I put
that (*) for: you could do a read-recursive lock if the filesystem
page fault path can only ever take read locks. But none of our regular
locks are read-recursive apart from the very special case of the
spinning rwlock in interrupts (see comment in
queued_read_lock_slowpath()).

That special read-recursive model "works", but I would seriously
caution against it, simply because such locks can get very unfair very
quickly. So it's a DoS magnet. It's part of why none of the normal
locking models really have that (any more - rwlocks used to all be
that way).

                     Linus