On 2/25/22 17:40, Theodore Ts'o wrote:
...
...and then put them in a filesystem header file, because these are now
tightly coupled to filesystems, what with the need to call
.write_begin() and .write_end().
Well, that makes it process_vm_writev()'s is that it needs to know
when to call pin_user_file_pages(). I suspect that for many use cases
--- for example, if this is being used by a debugger to modify a
variable on a stack, or an anonymous page in the program's data
segment, process_vm_writev() *isn't* actually pinning a file. So they
want some kind of interface that automatically DTRT regardless of
whether the user pages being edited are file-backed or not
file-backed.
So some kind of [un]pin_user_pages_local() which will call
write_{begin,end}() if necessary would be the most convenient for
users such as process_vm_writev().
OK, yes.
And perhaps would it make sense for pin_user_pages to optionally (or
by default?) check for file-backed pages, and if it finds any, return
an error or stop pinning pages at that point, so the system call can
return EOPNOSUPP to the user, instead of silently causing user data to
be lost or corrupted as is currently the case with xfs and btrfs (and
ext4 once I patch it so it doesn't BUG).
Yes, also a good move. It is definitely time for this.
I'll note that at least one caller of pin_user_pages, in fs/io_uring.c
takes it upon itself to check for file-backed pages, and returns
Well, not *exactly*: fs/io_uring.c calls is_file_hugepages(), which is a
check for hugetlbfs, rather than general check for file-backed pages. :)
But your point is still valid, and taken. The overall approach of,
"check for page type, then pin pages" is being done there.
EOPNOTSUPP if there are any found. Many that should be lifted to
pin_user_pages()?
For that matter, maybe pin_user_pages() and friends should take some
new FOLL_ flags to indicate whether file-backed pages should be
rejected, or perhaps they can promise they will only be holding the
pin for a very short amount of time (FOLL_SHORTERM?), and then
Naming: there is already a FOLL_LONGTERM, so anyone not using that is
already...non-FOLL_SHORTERM, so that would be too difficult to
understand.
Instead, maybe: FOLL_FILE, to indicate basically the inverse of your
FOLL_SHORTERM suggestion. And sweep through and augment the call sites
to pass in FOLL_FILE *at first*, so that the first patch leaves behavior
as-is. Then a patch per call site (bisection friendly), to start
actually changing behavior and dealing with the fallout.
pin_user_pages() and unpin_user_pages() can automagically call
write_begin() and write_end() if necessary? I dunno....
- Ted
This all sounds good to me. Thanks for thinking about this. I think this
is actually pretty easy to implement, too.
thanks,
--
John Hubbard
NVIDIA