[RFC PATCH 0/7] block, fs: convert Direct IO to FOLL_PIN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Summary:

This puts some prerequisites in place, including a CONFIG parameter,
making it possible to start converting and testing the Direct IO part of
each filesystem, from get_user_pages_fast(), to pin_user_pages_fast().

It will take "a few" kernel releases to get the whole thing done.

Details:

As part of fixing the "get_user_pages() + file-backed memory" problem
[1], and to support various COW-related fixes as well [2], we need to
convert the Direct IO code from get_user_pages_fast(), to
pin_user_pages_fast(). Because pin_user_pages*() calls require a
corresponding call to unpin_user_page(), the conversion is more
elaborate than just substitution.

Further complicating the conversion, the block/bio layers get their
Direct IO pages via iov_iter_get_pages() and iov_iter_get_pages_alloc(),
each of which has a large number of callers. All of those callers need
to be audited and changed so that they call unpin_user_page(), rather
than put_page().

After quite some time exploring and consulting with people as well, it
is clear that this cannot be done in just one patchset. That's because,
not only is this large and time-consuming (for example, Chaitanya
Kulkarni's first reaction, after looking into the details, was, "convert
the remaining filesystems to use iomap, *then* convert to FOLL_PIN..."),
but it is also spread across many filesystems.

With that in mind, let's apply most of this patchset soon-ish, and then
work on the filesystem conversions, likely over the course of a few
kernel releases. Once complete, then apply the last patch, and then one
final name change to remove the dio_w_ prefixes, and get us back to the
original names.

In this patchset:

Patches 1, 2, 3: provide the prerequisites to start converting call
sites to call the new dio_w_*() wrapper functions.

Patch 4: convert the core allocation routines to
dio_w_pin_user_pages_fast().

Patches 5, 6: convert a couple of callers (NFS, fuse) to use FOLL_PIN.
This also is a placeholder to show that "filesystems need to be
converted at this point".

At this point, Ubuntu 20.04 boots up and is able to support running some
fio direct IO tests, while keeping the foll pin counts in /proc/vmstat
balanced. (Ubuntu uses fuse during startup, interestingly enough.)

Patch 7: Get rid of the CONFIG parameter, thus effectively switching the
default Direct IO mechanism over to pin_user_pages_fast().

(Not shown): Patch 8: trivial but large: rename everything to get rid of
the dio_w_ prefix, and delete the wrappers.

This is based on mmotm as of about an hour ago. I've also stashed it
here:

    https://github.com/johnhubbard/linux bio_pup_mmotm_20220224

[1] https://lwn.net/Articles/753027/ "The trouble with get_user_pages()"

[2] https://lore.kernel.org/all/20211217113049.23850-1-david@xxxxxxxxxx/T/#u
    (David Hildenbrand's mm/COW fixes)

John Hubbard (7):
  mm/gup: introduce pin_user_page()
  block: add dio_w_*() wrappers for pin, unpin user pages
  block, fs: assert that key paths use iovecs, and nothing else
  block, bio, fs: initial pin_user_pages_fast() changes
  NFS: direct-io: convert to FOLL_PIN pages
  fuse: convert direct IO paths to use FOLL_PIN
  block, direct-io: flip the switch: use pin_user_pages_fast()

 block/bio.c          | 22 +++++++++++++---------
 block/blk-map.c      |  4 ++--
 fs/direct-io.c       | 26 ++++++++++++++------------
 fs/fuse/dev.c        |  5 ++++-
 fs/fuse/file.c       | 23 ++++++++---------------
 fs/iomap/direct-io.c |  2 +-
 fs/nfs/direct.c      |  2 +-
 include/linux/bvec.h |  4 ++++
 include/linux/mm.h   |  1 +
 lib/iov_iter.c       |  4 ++--
 mm/gup.c             | 34 ++++++++++++++++++++++++++++++++++
 11 files changed, 84 insertions(+), 43 deletions(-)


base-commit: 218d3ca9c0ea1c35f1bc5099325b7df54b52bbdd
prerequisite-patch-id: 7d5a742e37171a15d83a9b3ac9ba0951b573eed8
--
2.35.1





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux