Re: [PATCH 1/1] mm/migrate: Trylock device page in do_swap_page

Matthew Brost <matthew.brost@xxxxxxxxx> · Fri, 13 Sep 2024 22:39:48 +0000

On Wed, Sep 11, 2024 at 02:53:31PM +1000, Alistair Popple wrote:
> 
> Matthew Brost <matthew.brost@xxxxxxxxx> writes:
> 
> > Avoid multiple CPU page faults to the same device page racing by locking
> > the page in do_swap_page before taking an additional reference to the
> > page. This prevents scenarios where multiple CPU page faults each take
> > an extra reference to a device page, which could abort migration in
> > folio_migrate_mapping. With the device page locked in do_swap_page, the
> > migrate_vma_* functions need to be updated to avoid locking the
> > fault_page argument.
> 
> I added the get_page() and therefore the fault_page argument to deal
> with another lifetime issue. Neither Felix nor I particularly liked the
> solution though (see
> https://lore.kernel.org/all/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@xxxxxxxxxx/T/#m715589d57716448386ff9af41da27a952d94615a)
> and this seems to make it even uglier, so I have suggested an
> alternative solution below.
> 

Thanks for the ref, will read this through.

> > Prior to this change, a livelock scenario could occur in Xe's (Intel GPU
> > DRM driver) SVM implementation if enough threads faulted the same device
> > page.
> 
> I haven't seen the same in the NVIDIA UVM driver (out-of-tree, I know)

Still a driver.

> but theoretically it seems like it should be possible. However we
> serialize migrations of the same virtual address range to avoid these
> kind of issues as they can happen the other way too (ie. multiple
> threads trying to migrate to GPU).
> 
> So I suspect what happens in UVM is that one thread wins and installs
> the migration entry while the others fail to get the driver migration
> lock and bail out sufficiently early in the fault path to avoid the
> live-lock.
> 

I had to try hard to show this, doubt an actual user could trigger this.

I wrote a test which kicked 8 threads, each thread did a pthread join,
and then tried to read the same page. This repeats in loop for like 512
pages or something. I needed an exclusive lock in migrate_to_ram vfunc
for it to livelock. Without an exclusive lock I think on average I saw
about 32k retries (i.e. migrate_to_ram calls on the same page) before a
thread won this race.