On Fri, Mar 05, 2021 at 02:11:43PM -0800, Luck, Tony wrote: > This whole page table walking patch is trying to work around the > races caused by multiple calls to memory_failure() for the same > page. > > Maybe better to just avoid the races. The comment right above > memory_failure says: > > * Must run in process context (e.g. a work queue) with interrupts > * enabled and no spinlocks hold. > > So it should be safe to grab and hold a mutex. See patch below. The mutex approach looks simpler and safer, so I'm fine with it. > > -Tony > > commit 8dd0dbe7d595e02647e9c2c76c03341a9f6bd7b9 > Author: Tony Luck <tony.luck@xxxxxxxxx> > Date: Fri Mar 5 10:40:48 2021 -0800 > > Use a mutex to avoid memory_failure() races > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 24210c9bd843..c1509f4b565e 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1381,6 +1381,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > return rc; > } > > +static DEFINE_MUTEX(mf_mutex); > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -1424,12 +1426,18 @@ int memory_failure(unsigned long pfn, int flags) > return -ENXIO; > } > > + mutex_lock(&mf_mutex); Is it better to take mutex before memory_failure_dev_pagemap() block? Or we don't have to protect against race for device memory? Thanks, Naoya Horiguchi