Re: Folio mapcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 09, 2023 at 11:10:12PM +0800, Chih-En Lin wrote:
> On Thu, Feb 9, 2023 at 4:59 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> >
> > On Wed, Feb 08, 2023 at 08:25:10PM +0000, Matthew Wilcox wrote:
> > > On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote:
> > > > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote:
> > > > > I've been thinking about this one, and I wonder if we can do it
> > > > > without taking any pgtable locks.  The locking environment we're in
> > > > > is the page fault handler, so we have the mmap_lock for read (for now
> > > > > anyway ...).  We also hold the folio lock, so _if_ the folio is mapped,
> > > > > those entries can't disappear under us.
> > > >
> > > > Could MADV_DONTNEED do that from another pgtable that we don't hold the
> > > > pgtable lock?
> > >
> > > Oh, ugh, yes.  And zap_pte_range() has the PTL first, so we can't sleep
> > > to get the folio lock.  And we can't decline to zap the pte on a failed
> > > folio_trylock() (well, we could for MADV_DONTNEED, but not in general).
> > >
> > > So ... how about this for a solution:
> > >
> > >  - If the folio overlaps into the next PMD table, spin_lock it.
> > >  - If the folio overlaps into the previous PMD table, unlock our
> > >    PTL, lock the previous PTL, re-lock our PTL.
> > >  - Do the pvmw, telling it we already have the PTLs held (new PVMW flag).
> > >
> > > [explanation simplified; if there is no prior PMD table or if the VMA
> > > limits how far to search, we can skip this]
> > >
> > > We have prior art for taking two PTLs in copy_page_range().  There,
> > > the hierarchy is clear; one VMA belongs to the process parent and one
> > > to the child.  I don't believe we have precedent for taking two PTLs
> > > in the same VMA, but I think my proposal (order by ascending address in
> > > the process) is the obvious order to choose.
> >
> > Maybe it'll work?  Not sure, but seems be something we'd be extremely
> > careful with.  Having a single mmap read lock covering both seems to
> > guarantee that the order of the lock is stable, which is a good start..
> > But I have no good idea on other implications across the whole kernel.
> >
> > IMHO copy_page_range() is not a great example for proving deadlocks,
> > because the dst_mm should not be exposed to the whole world yet at all when
> > copying.  Say, I don't see any case some thread can try to take the dst mm
> > pgtable lock at all until it's all set.  I'm even wondering whether it's
> > safe to not take the dst mm pgtable lock at all during a fork()..
> 
> I don't think it's safe without taking the dst mm pgtable lock during a fork().
> Since copy_present_page() will add the page to the anon_vma, the page
> can be searched by the rmap.
> So, even the fork doesn't finish the duplication of pgtable.
> We can still use the existing (and COW mapping) page to access the dst
> pgtable by rmap + page_vma_mapped_walk().
> But, I didn't consider the mmap_write_lock() here. So, I might be wrong here.
> Just provide some thoughts.

Yes I think you're right - I thought the rmap locks was held but after I
rechecked they're not.

It's safe because AFAICT any rmap can only take either of the pgtable locks
but not both.  To trigger any potential deadlock on the two spinlocks we
need another concurrent thread trying to take the same two locks, but it
will not happen in this case.

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux