Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2023/3/21 18:59, Shiyang Ruan 写道:


在 2023/2/27 18:06, Shiyang Ruan 写道:


在 2023/2/27 8:07, Dave Chinner 写道:
On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote:
This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1].  With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
(or mapped device) on it to unmap all files in use and notify processes
who are using those files.

Call trace:
trigger unbind
  -> unbind_store()
   -> ... (skip)
    -> devres_release_all()   # was pmem driver ->remove() in v1
     -> kill_dax()
      -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
       -> xfs_dax_notify_failure()

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event.  So do not shutdown filesystem directly if something not
supported, or if failure range includes metadata area.  Make sure all
files and processes are handled correctly.

[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Signed-off-by: Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx>

.....

---
@@ -225,6 +242,15 @@ xfs_dax_notify_failure(
      if (offset + len - 1 > ddev_end)
          len = ddev_end - offset + 1;
+    if (mf_flags & MF_MEM_PRE_REMOVE) {
+        xfs_info(mp, "device is about to be removed!");
+        error = freeze_super(mp->m_super);
+        if (error)
+            return error;
+        /* invalidate_inode_pages2() invalidates dax mapping */
+        super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
+    }

Why do you still need to drop the pagecache here? My suggestion was
to replace it with freezing the filesystem at this point is to stop
it being dirtied further before the device remove actually occurs.
The userspace processes will be killed, their DAX mappings reclaimed
and the filesystem shut down before device removal occurs, so
super_drop_pagecache() is largely superfluous as it doesn't actually
provide any protection against racing with new mappings or dirtying
of existing/newly created mappings.

Freezing doesn't stop the creation of new mappings, either, it just
cleans all the dirty mappings and halts anything that is trying to

This is the point I wasn't aware of.

dirty existing clean mappings. It's not until we kill the userspace
processes that new mappings will be stopped, and it's not until we
shut the filesystem down that the filesystem itself will stop
accessing the storage.

Hence I don't see why you retained super_drop_pagecache() here at
all. Can you explain why it is still needed?


So I was just afraid that it's not enough for rmap & processes killer to invalidate the dax mappings.  If something error happened during the rmap walker, the fs will shutdown and there is no chance to invalidate the rest mappings whose user didn't be killed yet.

Now that freezing the fs is enough, I will remove the drop cache code.

I removed the drop cache code, then kernel always went into crash when running the test[1].  After the investigation, I found that the crash is cause by accessing (invalidate dax pages when umounting fs) the page of a pmem while the pmem has been removed.

According to the design, the dax page should have been invalidated by mf_dax_kill_procs() but it didn't.  I found two reasons:
  1. collect_procs_fsdax() only kills the current process
 2. unmap_mapping_range() doesn't invalidate the dax pages (disassociate dax entry in fs/dax.c), which causes the crash in my test

So, I think we should:
 1. pass the mf_flag to collect_procs_fsdax() to let it collect all processes associated with the file on the XFS.  2. drop cache is still needed, but just drop the associated files' cache after mf_dax_kill_procs(), instead of dropping cache of the whole filesystem.

Then the logic shuld be looked like this:
unbind
  `-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
    `-> xfs_dax_notify_failure()
      `-> freeze_super()
      `-> do xfs rmap
        `-> mf_dax_kill_procs()
          `-> collect_procs_fsdax()   // all associated
          `-> unmap_and_kill()
        `-> invalidate_inode_pages2() // drop file's cache
      `-> thaw_super()


[1] The step of unbind test:
  1. create fsdax namespace on a pmem
  2. mkfs.xfs on it
  3. run fsx test in background
  4. wait 1s
  5. echo "pfn0.1" > unbind
  6. wait 1s
  7. umount xfs       --> crash happened


Hi,

Any comments?



--
Thanks,
Ruan.



--
Thanks,
Ruan.


-Dave.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux