On Tue, Apr 19, 2022 at 12:50:43PM +0800, Shiyang Ruan wrote: > This new function is a variant of mf_generic_kill_procs that accepts a > file, offset pair instead of a struct to support multiple files sharing > a DAX mapping. It is intended to be called by the file systems as part > of the memory_failure handler after the file system performed a reverse > mapping from the storage address to the file and file offset. > > Signed-off-by: Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > --- ... > @@ -539,13 +548,41 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, > * to be informed of all such data corruptions. > */ > if (vma->vm_mm == t->mm) > - add_to_kill(t, page, vma, to_kill); > + add_to_kill(t, page, 0, vma, to_kill); > } > } > read_unlock(&tasklist_lock); > i_mmap_unlock_read(mapping); > } > > +#if IS_ENABLED(CONFIG_FS_DAX) This macro is equivalent with "#ifdef CONFIG_FS_DAX", and you also add "#ifdef" below, so aligning to either (I prefer "#ifdef") might be better. > +/* > + * Collect processes when the error hit a fsdax page. > + */ > +static void collect_procs_fsdax(struct page *page, > + struct address_space *mapping, pgoff_t pgoff, > + struct list_head *to_kill) Unlike collect_procs_file(), this new function does not have parameter force_early, and passes true unconditionally to task_early_kill(). Looking at the current code, I noticed the following code and comment: /* * Unlike System-RAM there is no possibility to swap in a * different physical page at a given virtual address, so all * userspace consumption of ZONE_DEVICE memory necessitates * SIGBUS (i.e. MF_MUST_KILL) */ flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; , which forcibly sets MF_ACTION_REQUIRED and I guess that this is the reason for passing true above. But now I'm not sure that setting these flags for all error events on NVDIMM is really right thing. The above comment sounds to me that memory_failure_dev_pagemap() is called to handle the event when the data on ZONE_DEVICE memory is about to be accessed/consumed, but is that the only case? I thought that memory_failure() can be called by memory scrubbing *before* some process actually access to it, and MCE handler judges whether action is required or not based on MSRs. Does this not happen on NVDIMM error? Anyway this question might be a little off-topic, so not to be a blocker for this patchset. Thanks, Naoya Horiguchi