Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 16 Mar 2020 09:56:19 -0300

On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > 
> > This eventually calls into handle_mm_fault() which is a sleeping function.
> > Release the lock first.
> > 
> > hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> > need the lock.
> 
> So how did this manage to not be noticed before?

I'm still struggling a bit with how this PUD stuff works.. However
AFAICT:

1) The first hunk around pud_none() is actualy covering a race. In the
   non-race case the page walker will directly call
   hmm_vma_walk_hole() for pud_none so it would be very hard to hit
   this

2) I'm not 100% sure, but I think pud_present() == pud_none(), as
   there is no swap entry for PUD I don't know what a non-present
   but non-none entry is or how to create one. This is possibly dead
   code due to #1

3) To hit hmm_range_need_fault() requesting fault you would need
   a read-only huge PUD and a fault requesting write. I suspect
   creating a read only huge PUD entry is very rare - not something
   someone would deliberately construct.

4) To even trigger the PUD flow at all you need the 1G THP or the
   special 1G DAX pages which I strongly suspect people are not
   testing with.

> The fix looks fine assuming we want something backportable before
> starting the cleanups:

I found it easier to make the other cleanup patches make sense if they
didn't introduce all kinds of new logic too..

Thanks,
Jason