Re: [PATCH] fs: improve dump_mapping() robustness

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> · Mon, 22 Jan 2024 15:17:50 +0800

On 1/19/2024 11:48 PM, Charan Teja Kalla wrote:
Hi Matthew/Baolin,

On 1/18/2024 8:13 AM, Baolin Wang wrote:

On 1/18/2024 9:38 AM, Al Viro wrote:
On Tue, Jan 16, 2024 at 03:53:35PM +0800, Baolin Wang wrote:

With checking the 'dentry.parent' and 'dentry.d_name.name' used by
dentry_name(), I can see dump_mapping() will output the invalid dentry
instead of crashing the system when this issue is reproduced again.

       dentry_ptr = container_of(dentry_first, struct dentry,
d_u.d_alias);
-    if (get_kernel_nofault(dentry, dentry_ptr)) {
+    if (get_kernel_nofault(dentry, dentry_ptr) ||
+        !dentry.d_parent || !dentry.d_name.name) {
           pr_warn("aops:%ps ino:%lx invalid dentry:%px\n",
                   a_ops, ino, dentry_ptr);
           return;

That's nowhere near enough.  Your ->d_name.name can bloody well be
pointing
to an external name that gets freed right under you.  Legitimately so.

Think what happens if dentry has a long name (longer than would fit into
the embedded array) and gets renamed name just after you copy it into
a local variable.  Old name will get freed.  Yes, freeing is RCU-delayed,
but I don't see anything that would prevent your thread losing CPU
and not getting it back until after the sucker's been freed.

Yes, that's possible. And this appears to be a use-after-free issue in
the existing code, which is different from the issue that my patch
addressed.

So how about adding a rcu_read_lock() before copying the dentry to a
local variable in case the old name is freed?

We too seen the below crash while printing the dentry name.

aops:shmem_aops ino:5e029 dentry name:"dev/zero"
flags:
0x8000000000080006(referenced|uptodate|swapbacked|zone=2|kasantag=0x0)
raw: 8000000000080006 ffffffc033b1bb60 ffffffc033b1bb60 ffffff8862537600
raw: 0000000000000001 0000000000000000 00000003ffffffff ffffff807fe64000
page dumped because: migration failure
migrating pfn aef223 failed ret:1
page:000000009e72a120 refcount:3 mapcount:0 mapping:000000003325dda1
index:0x1 pfn:0xaef223
memcg:ffffff807fe64000
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000005
   CM = 0, WnR = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=000000090c12d000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP

dentry_name+0x1f8/0x3a8
pointer+0x3b0/0x6b8
vsnprintf+0x4a4/0x65c
vprintk_store+0x168/0x4a8
vprintk_emit+0x98/0x218
vprintk_default+0x44/0x70
vprintk+0xf0/0x138
_printk+0x54/0x80
dump_mapping+0x17c/0x188
dump_page+0x1d0/0x2e8
offline_pages+0x67c/0x898

Not much comfortable with block layer internals, TMK, the below is what
happening in the my case:
memoffline	     		dput()
(offline_pages)		 (as part of closing of the shmem file)
------------		 --------------------------------------
					.......
			1) dentry_unlink_inode()
			      hlist_del_init(&dentry->d_u.d_alias);

			2) iput():
			    a) inode->i_state |= I_FREEING
				.....
			    b) evict_inode()->..->shmem_undo_range
			       1) get the folios with elevated refcount
3) do_migrate_range():
    a) Because of the elevated
    refcount in 2.b.1, the
    migration of this page will
    be failed.

			       2) truncate_inode_folio() ->
				     filemap_remove_folio():
  				(deletes from the page cache,
				 set page->mapping=NULL,
				 decrement the refcount on folio)
   b) Call dump_page():
      1) mapping = page_mapping(page);
      2) dump_mapping(mapping)
	  a) We unlinked the dentry in 1)
            thus dentry_ptr from host->i_dentry.first
            is not a proper one.

          b) dentry name print with %pd is resulting into
	   the mentioned crash.

At least in this case, I think __this patchset in its current form can
help us__.

This looks another case of NULL pointer access. Thanks for the detailed 
analysis. Could you provide a Tested-by or Reviewed-by tag if it can 
solve your problem?