On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote: > On Wed 27-06-18 19:42:01, John Hubbard wrote: > > On 06/27/2018 10:02 AM, Jan Kara wrote: > > > On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote: > > >> On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote: > > >>> On Wed 27-06-18 13:59:27, Michal Hocko wrote: > > >>>> On Wed 27-06-18 13:53:49, Jan Kara wrote: > > >>>>> On Wed 27-06-18 13:32:21, Michal Hocko wrote: > > >>>> [...] > > >>>>>> Appart from that, do we really care about 32b here? Big DIO, IB users > > >>>>>> seem to be 64b only AFAIU. > > >>>>> > > >>>>> IMO it is a bad habit to leave unpriviledged-user-triggerable oops in the > > >>>>> kernel even for uncommon platforms... > > >>>> > > >>>> Absolutely agreed! I didn't mean to keep the blow up for 32b. I just > > >>>> wanted to say that we can stay with a simple solution for 32b. I thought > > >>>> the g-u-p-longterm has plugged the most obvious breakage already. But > > >>>> maybe I just misunderstood. > > >>> > > >>> Most yes, but if you try hard enough, you can still trigger the oops e.g. > > >>> with appropriately set up direct IO when racing with writeback / reclaim. > > >> > > >> gup longterm is only different from normal gup if you have DAX and few > > >> people do, which really means it doesn't help at all.. AFAIK?? > > > > > > Right, what I wrote works only for DAX. For non-DAX situation g-u-p > > > longterm does not currently help at all. Sorry for confusion. > > > > > > > OK, I've got an early version of this up and running, reusing the page->lru > > fields. I'll clean it up and do some heavier testing, and post as a PATCH v2. > > Cool. > > > One question though: I'm still vague on the best actions to take in the > > following functions: > > > > page_mkclean_one > > try_to_unmap_one > > > > At the moment, they are both just doing an evil little early-out: > > > > if (PageDmaPinned(page)) > > return false; > > > > ...but we talked about maybe waiting for the condition to clear, instead? > > Thoughts? > > What needs to happen in page_mkclean() depends on the caller. Most of the > callers really need to be sure the page is write-protected once > page_mkclean() returns. Those are: > > pagecache_isize_extended() > fb_deferred_io_work() > clear_page_dirty_for_io() if called for data-integrity writeback - which > is currently known only in its caller (e.g. write_cache_pages()) where > it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this > information into page_mkclean() will require some plumbing and > clear_page_dirty_for_io() has some 50 callers but it's doable. > > clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode != > WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as > otherwise memory cleaning would get stuck on pinned pages until RDMA > drivers release its pins. Sorry for naive question, but won't it create too much dirty pages so writeback will be called "non-stop" to rebalance watermarks without ability to progress? Thanks
Attachment:
signature.asc
Description: PGP signature