Re: [RFC] mm: gup: add helper page_try_gup_pin(page)

Hillf Danton <hdanton@xxxxxxxx> · Fri, 8 Nov 2019 17:38:37 +0800

On Thu, 7 Nov 2019 09:57:48 -0500 Jerome Glisse wrote:
> 
> I am not sure i follow ? Today we can not differentiate between GUP
> and regular get_page(), if you use some combination of specific fs
> and hardware you might get some BUG_ON() throws at you depending on
> how lucky/unlucky you are. We can not solve this without being able
> to differentiate between GUP and regular get_page(). Hence why John's
> patchset is the first step in the right direction.
> 
What is the second one? And when? By who?

> If there is no GUP on a page then regular writeback happens as it has
> for years now so in absence of GUP i do not see any issue.
> 
> 
> > > still something where there is no agreement as far as i remember the
> > > outcome of the last discussion we had. I expect this will a topic
> > > at next LSF/MM or maybe something we can flush out before.
> >
> > These are the restraints we know
> >
> > A, multiple gup pins
> > B, mutual data corruptions
> > C, no break of existing use cases
> > D, zero copy
> 
> ? What you mean by zero copy ?
> 
Snippet that can be found at https://lwn.net/Articles/784574/

"get_user_pages() is a way to map user-space memory into the kernel's
address space; it will ensure that all of the requested pages have
been faulted into RAM (and locked there) and provide a kernel mapping
that, in turn, can be used for direct access by the kernel or (more
often) to set up zero-copy I/O operations.

> > E, feel free to add
> >
> > then what is preventing an agreement like bounce page?
> 
> There is 2 sides (AFAIR):
>     - do not write back GUPed page and wait until GUP goes away to
>       write them. But GUP can last as long as the uptime and we can
>       loose data on power failure.
>     - use a bounce page so that there is a chance we have some data
>       on power failure
> 
> >
> > Because page migrate and reclaim have been working for a while with
> > gup pin taken into account, detecting it has no priority in any form
> > over the agreement on how to make a witeback page stable.
> 
> migrate just ignore GUPed page and thus there is no issue with migrate.
> writeback is a special case here because some filesystem need a stable
> page content and also we need to inhibit some fs specific things that
> trigger BUG_ON() in set_page_dirty*()
> 
Which drivers so far have been snared by the BUG_ON()? Is there any
chance to fix them one after another? Otherwise what is making them
special (long-lived pin)?

After setting page dirty, is there any pending DMA transfer to the
dirty page? If yes, what is the point to do writeback for corrupted
data? If no, what is preventing the gup pin from being released?

> > What seems more important, restriction B above makes C hard to meet
> > in any feasible approach trying to keep a writeback page stable, and
> > zero-copy makes it harder AFAICS.
> 
> writeback can use bounce page, zero copy ie not having to use bounce
> page, is not an issue in fact in some cases we already use bounce page
> (at the block device level).
> 
> >
> > > In any case my opinion is bounce page is the best thing we can do,
> > > from application and FS point of view it mimics the characteristics
> > > of regular write-back just as if the write protection window of the
> > > write-backed page was infinitly short.
> >
> > A 100-line patch tells more than a 200-line explanation can and helps
> > to shorten the discussion prior to reaching an agreement.
> 
> It is not that trivial, you need to make sure every layer from fs down
> to block device driver properly behave in front of bounce page. We have
> such mechanism for bio but it is a the bio level but maybe it can be
> dumped one level.