Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 19 Dec 2018 21:28:25 +1100

On Tue, Dec 18, 2018 at 08:03:29PM -0700, Jason Gunthorpe wrote:
> On Wed, Dec 19, 2018 at 10:42:54AM +1100, Dave Chinner wrote:
> 
> > Essentially, what we are talking about is how to handle broken
> > hardware. I say we should just brun it with napalm and thermite
> > (i.e. taint the kernel with "unsupportable hardware") and force
> > wait_for_stable_page() to trigger when there are GUP mappings if
> > the underlying storage doesn't already require it.
> 
> If you want to ban O_DIRECT/etc from writing to file backed pages,
> then just do it.

O_DIRECT IO *isn't the problem*.

iO_DIRECT IO uses a short term pin that the existing prefaulting
during GUP works just fine for. The problem we have is the long term
pins where pages can be cleaned while the pages are pinned. i.e. the
use case we current have to disable for DAX because *we can't make
it work sanely* without either revokable file leases and/or hardware
that is able to trigger page faults when they need write access to a
clean page.

> Otherwise I'm not sure demanding some unrealistic HW design is
> reasonable. ie nvme drives are not likely to add page faulting to
> their IO path any time soon.

Direct IO on nvme drives are not the problem. It's RDMA pinning
pages for hours or days and expecting everyone else to jump through
hoops to support their broken page access access model.

> A SW architecture that relies on page faulting is just not going to
> support real world block IO devices.

The existing software architecture for file backed pages has been
based around page faulting for write notifications since ~2005. That
horse bolted many, many years ago.

> GPUs and one RDMA are about the only things that can do this today,
> and they are basically irrelevant to O_DIRECT.

It's RDMA that we need these changes for, not O_DIRECT.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx