Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2021/1/6 下午11:41, Jan Kara wrote:
On Thu 31-12-20 00:55:55, Shiyang Ruan wrote:
The current memory_failure_dev_pagemap() can only handle single-mapped
dax page for fsdax mode.  The dax page could be mapped by multiple files
and offsets if we let reflink feature & fsdax mode work together.  So,
we refactor current implementation to support handle memory failure on
each file and offset.

Signed-off-by: Shiyang Ruan <ruansy.fnst@xxxxxxxxxxxxxx>

Overall this looks OK to me, a few comments below.

---
  fs/dax.c            | 21 +++++++++++
  include/linux/dax.h |  1 +
  include/linux/mm.h  |  9 +++++
  mm/memory-failure.c | 91 ++++++++++++++++++++++++++++++++++-----------
  4 files changed, 100 insertions(+), 22 deletions(-)

...

@@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct *tsk, struct page *p,
  	}
tk->addr = page_address_in_vma(p, vma);
-	if (is_zone_device_page(p))
-		tk->size_shift = dev_pagemap_mapping_shift(p, vma);
-	else
+	if (is_zone_device_page(p)) {
+		if (is_device_fsdax_page(p))
+			tk->addr = vma->vm_start +
+					((pgoff - vma->vm_pgoff) << PAGE_SHIFT);

It seems strange to use 'pgoff' for dax pages and not for any other page.
Why? I'd rather pass correct pgoff from all callers of add_to_kill() and
avoid this special casing...

Because one fsdax page can be shared by multiple pgoffs. I have to pass each pgoff in each iteration to calculate the address in vma (for tk->addr). Other kinds of pages don't need this. They can get their unique address by calling "page_address_in_vma()".

So, I added this fsdax case here. This patchset only implemented the fsdax case, other cases also need to be added here if to be implemented.


--
Thanks,
Ruan Shiyang.


+		tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr);
+	} else
  		tk->size_shift = page_shift(compound_head(p));
/*
@@ -495,7 +501,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
  			if (!page_mapped_in_vma(page, vma))
  				continue;
  			if (vma->vm_mm == t->mm)
-				add_to_kill(t, page, vma, to_kill);
+				add_to_kill(t, page, NULL, 0, vma, to_kill);
  		}
  	}
  	read_unlock(&tasklist_lock);
@@ -505,24 +511,19 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
  /*
   * Collect processes when the error hit a file mapped page.
   */
-static void collect_procs_file(struct page *page, struct list_head *to_kill,
-				int force_early)
+static void collect_procs_file(struct page *page, struct address_space *mapping,
+		pgoff_t pgoff, struct list_head *to_kill, int force_early)
  {
  	struct vm_area_struct *vma;
  	struct task_struct *tsk;
-	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff;
i_mmap_lock_read(mapping);
  	read_lock(&tasklist_lock);
-	pgoff = page_to_pgoff(page);
  	for_each_process(tsk) {
  		struct task_struct *t = task_early_kill(tsk, force_early);
-
  		if (!t)
  			continue;
-		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
-				      pgoff) {
+		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
  			/*
  			 * Send early kill signal to tasks where a vma covers
  			 * the page but the corrupted page is not necessarily
@@ -531,7 +532,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
  			 * to be informed of all such data corruptions.
  			 */
  			if (vma->vm_mm == t->mm)
-				add_to_kill(t, page, vma, to_kill);
+				add_to_kill(t, page, mapping, pgoff, vma, to_kill);
  		}
  	}
  	read_unlock(&tasklist_lock);
@@ -550,7 +551,8 @@ static void collect_procs(struct page *page, struct list_head *tokill,
  	if (PageAnon(page))
  		collect_procs_anon(page, tokill, force_early);
  	else
-		collect_procs_file(page, tokill, force_early);
+		collect_procs_file(page, page->mapping, page_to_pgoff(page),

Why not use page_mapping() helper here? It would be safer for THPs if they
ever get here...

								Honza






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux