> -----Original Message----- > From: Dan Williams <dan.j.williams@xxxxxxxxx> > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock > > On Wed, Jun 16, 2021 at 11:51 PM ruansy.fnst@xxxxxxxxxxx > <ruansy.fnst@xxxxxxxxxxx> wrote: > > > > > -----Original Message----- > > > From: Dan Williams <dan.j.williams@xxxxxxxxx> > > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for > > > superblock > > > > > > [ drop old linux-nvdimm@xxxxxxxxxxxx, add nvdimm@xxxxxxxxxxxxxxx ] > > > > > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> > wrote: > > > > > > > > Memory failure occurs in fsdax mode will finally be handled in > > > > filesystem. We introduce this interface to find out files or > > > > metadata affected by the corrupted range, and try to recover the > > > > corrupted data if possiable. > > > > > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> > > > > --- > > > > include/linux/fs.h | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index > > > > c3c88fdb9b2a..92af36c4225f 100644 > > > > --- a/include/linux/fs.h > > > > +++ b/include/linux/fs.h > > > > @@ -2176,6 +2176,8 @@ struct super_operations { > > > > struct shrink_control *); > > > > long (*free_cached_objects)(struct super_block *, > > > > struct shrink_control *); > > > > + int (*corrupted_range)(struct super_block *sb, struct > > > > + block_device > > > *bdev, > > > > + loff_t offset, size_t len, void > > > > + *data); > > > > > > Why does the superblock need a new operation? Wouldn't whatever > > > function is specified here just be specified to the dax_dev as the > > > ->notify_failure() holder callback? > > > > Because we need to find out which file is effected by the given poison page so > that memory-failure code can do collect_procs() and kill_procs() jobs. And it > needs filesystem to use its rmap feature to search the file from a given offset. > So, we need this implemented by the specified filesystem and called by > dax_device's holder. > > > > This is the call trace I described in cover letter: > > memory_failure() > > * fsdax case > > pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() > > dax_device->holder_ops->corrupted_range() => > > - fs_dax_corrupted_range() > > - md_dax_corrupted_range() > > sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== > **HERE** > > xfs_rmap_query_range() > > xfs_currupt_helper() > > * corrupted on metadata > > try to recover data, call xfs_force_shutdown() > > * corrupted on file data > > try to recover data, call mf_dax_kill_procs() > > * normal case > > mf_generic_kill_procs() > > > > As you can see, this new added operation is an important for the whole > progress. > > I don't think you need either fs_dax_corrupted_range() nor > sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range() > looks broken because the filesystem may not even be mounted on the device > associated with the error. If filesystem is not mounted, then there won't be any process using the broken page and no one need to be killed in memory-failure. So, I think we can just return and handle the error on driver level if needed. > The holder_data and holder_op should be sufficient > from communicating the stack of notifications: > > pgmap->notify_memory_failure() => pmem_pgmap_notify_failure() > pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) => > md_dax_notify_failure() > md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure() > > I.e. the entire chain just walks dax_dev holder ops. Oh, I see. Just need to implement holder_ops in filesystem or mapped_device directly. I made the routine complicated. -- Thanks, Ruan.