Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> · Thu, 2 Jun 2022 17:42:13 +0800

Hi,

Is there any other work I should do with these two patchsets?  I think 
they are good for now.  So... since the 5.19-rc1 is coming, could the 
notify_failure() part be merged as your plan?

--
Thanks,
Ruan.

在 2022/5/12 20:27, Shiyang Ruan 写道:

在 2022/5/11 23:46, Dan Williams 写道:
On Wed, May 11, 2022 at 8:21 AM Darrick J. Wong <djwong@xxxxxxxxxx> 
wrote:

Oan Tue, May 10, 2022 at 10:24:28PM -0700, Andrew Morton wrote:
On Tue, 10 May 2022 19:43:01 -0700 "Darrick J. Wong" 
<djwong@xxxxxxxxxx> wrote:

On Tue, May 10, 2022 at 07:28:53PM -0700, Andrew Morton wrote:
On Tue, 10 May 2022 18:55:50 -0700 Dan Williams 
<dan.j.williams@xxxxxxxxx> wrote:

It'll need to be a stable branch somewhere, but I don't think it
really matters where al long as it's merged into the xfs for-next
tree so it gets filesystem test coverage...

So how about let the notify_failure() bits go through -mm this 
cycle,
if Andrew will have it, and then the reflnk work has a clean 
v5.19-rc1
baseline to build from?

What are we referring to here?  I think a minimal thing would be the
memremap.h and memory-failure.c changes from
https://lkml.kernel.org/r/20220508143620.1775214-4-ruansy.fnst@xxxxxxxxxxx 
?

Sure, I can scoot that into 5.19-rc1 if you think that's best.  It
would probably be straining things to slip it into 5.19.

The use of EOPNOTSUPP is a bit suspect, btw.  It *sounds* like the
right thing, but it's a networking errno.  I suppose livable with 
if it
never escapes the kernel, but if it can get back to userspace then a
user would be justified in wondering how the heck a filesystem
operation generated a networking errno?

<shrug> most filesystems return EOPNOTSUPP rather enthusiastically 
when
they don't know how to do something...

Can it propagate back to userspace?

AFAICT, the new code falls back to the current (mf_generic_kill_procs)
failure code if the filesystem doesn't provide a ->memory_failure
function or if it returns -EOPNOSUPP.  mf_generic_kill_procs can also
return -EOPNOTSUPP, but all the memory_failure() callers (madvise, etc.)
convert that to 0 before returning it to userspace.

I suppose the weirder question is going to be what happens when madvise
starts returning filesystem errors like EIO or EFSCORRUPTED when pmem
loses half its brains and even the fs can't deal with it.

Even then that notification is not in a system call context so it
would still result in a SIGBUS notification not a EOPNOTSUPP return
code. The only potential gap I see are what are the possible error
codes that MADV_SOFT_OFFLINE might see? The man page is silent on soft
offline failure codes. Shiyang, that's something to check / update if
necessary.

According to the code around MADV_SOFT_OFFLINE, it will return -EIO when 
the backend is NVDIMM.

Here is the logic:
  madvise_inject_error() {
      ...
      if (MADV_SOFT_OFFLINE) {
          ret = soft_offline_page() {
              ...
              /* Only online pages can be soft-offlined (esp., not 
ZONE_DEVICE). */
              page = pfn_to_online_page(pfn);
              if (!page) {
                  put_ref_page(ref_page);
                  return -EIO;
              }
              ...
          }
      } else {
          ret = memory_failure()
      }
      return ret
  }

--
Thanks,
Ruan.