Let me know if any of the patches are ready for merging or any additional patches. I will add at least one to fix the R vs RH dir lease issue Ralph pointed out. On Wed, Nov 13, 2024 at 1:08 PM Paul Aurich <paul@xxxxxxxxxxxxxx> wrote: > > I'm working on a v2 set of patches that I hope would address this (I'm > guessing it's somehow caused by the flush_workqueue() call, although to be > honest, I have no idea). > > My testing has also turned up another issue that needs fixing (the laundromat > thread can also race with cifs_kill_sb, resulting in yet another 'Dentry still > in use' oops). > > ~Paul > > On 2024-11-09 18:49:54 -0600, Steve French wrote: > >Running the buildbot on these against Windows with directory leases > >enabled (ie not using "nohandlecache") I see a crash in generic/246 in > >cifs_mount (see call trace at end of the email for more details). The > >series does improve things, it fixes the oops that would cause e.g. > >generic/048 to crash e.g. it fixes this one > > > >[ 2039.224037] BUG: Dentry 00000000114e5018{i=6000000019c6c,n=/} > >still in use (2) [unmount of cifs cifs] > >[ 2039.224086] WARNING: CPU: 1 PID: 123637 at fs/dcache.c:1536 > >umount_check+0xc3/0xf0 > >[ 2039.224112] Modules linked in: cifs ccm cmac nls_utf8 cifs_arc4 > >nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver > >nfs lockd grace netfs nf_conntrack_netbios_ns nf_conntrack_broadcast > >nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > >nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat > >nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle > >ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack > >nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw > >iptable_security ip_set ebtable_filter ebtables ip6table_filter > >ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net > >net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink > >zram xfs bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper > >crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel > >sha512_ssse3 sha256_ssse3 floppy virtio_blk sha1_ssse3 qemu_fw_cfg > >virtio_console [last unloaded: cifs] > >[ 2039.224382] CPU: 1 UID: 0 PID: 123637 Comm: rm Not tainted 6.12.0-rc6 #1 > >[ 2039.224390] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014 > >[ 2039.224395] RIP: 0010:umount_check+0xc3/0xf0 > >[ 2039.224401] Code: db 74 0d 48 8d 7b 40 e8 9b e3 f5 ff 48 8b 53 40 > >41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 74 ba a2 e8 dd > >e8 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 41 83 > >fc 01 > >[ 2039.224408] RSP: 0018:ff11000128bcf968 EFLAGS: 00010282 > >[ 2039.224416] RAX: dffffc0000000000 RBX: ff11000110ba3d80 RCX: 0000000000000027 > >[ 2039.224422] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb0b1a08 > >[ 2039.224427] RBP: ff11000112ebb8f8 R08: ffffffffa13f759e R09: ffe21c0099616341 > >[ 2039.224432] R10: ff110004cb0b1a0b R11: 0000000000000001 R12: 0000000000000002 > >[ 2039.224437] R13: ff1100013fe0c668 R14: ffffffffc1dee340 R15: ffffffffc2094200 > >[ 2039.224442] FS: 00007fdc92c66740(0000) GS:ff110004cb080000(0000) > >knlGS:0000000000000000 > >[ 2039.224447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >[ 2039.224452] CR2: 0000556c6a58d488 CR3: 00000001229ec006 CR4: 0000000000373ef0 > >[ 2039.224465] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >[ 2039.224470] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >[ 2039.224475] Call Trace: > >[ 2039.224479] <TASK> > >[ 2039.224485] ? __warn+0xa9/0x220 > >[ 2039.224497] ? umount_check+0xc3/0xf0 > >[ 2039.224506] ? report_bug+0x1d4/0x1e0 > >[ 2039.224521] ? handle_bug+0x5b/0xa0 > >[ 2039.224529] ? exc_invalid_op+0x18/0x50 > >[ 2039.224537] ? asm_exc_invalid_op+0x1a/0x20 > >[ 2039.224556] ? irq_work_claim+0x1e/0x40 > >[ 2039.224568] ? umount_check+0xc3/0xf0 > >[ 2039.224577] ? umount_check+0xc3/0xf0 > >[ 2039.224586] ? __pfx_umount_check+0x10/0x10 > >[ 2039.224594] d_walk+0x71/0x4e0 > >[ 2039.224604] ? d_walk+0x4b/0x4e0 > >[ 2039.224616] shrink_dcache_for_umount+0x6d/0x220 > >[ 2039.224629] generic_shutdown_super+0x4a/0x1c0 > >[ 2039.224640] kill_anon_super+0x22/0x40 > >[ 2039.224649] cifs_kill_sb+0x78/0x90 [cifs] > >[ 2039.224848] deactivate_locked_super+0x69/0xf0 > >[ 2039.224858] cifsFileInfo_put_final+0x25a/0x290 [cifs] > >[ 2039.225019] _cifsFileInfo_put+0x5f9/0x770 [cifs] > >[ 2039.225172] ? __pfx__cifsFileInfo_put+0x10/0x10 [cifs] > >[ 2039.225319] ? mark_held_locks+0x65/0x90 > >[ 2039.225339] ? kfree+0x1e5/0x490 > >[ 2039.225349] ? cifs_close_deferred_file_under_dentry+0x2e5/0x350 [cifs] > >[ 2039.225494] cifs_close_deferred_file_under_dentry+0x26d/0x350 [cifs] > >[ 2039.225639] ? __pfx_cifs_close_deferred_file_under_dentry+0x10/0x10 [cifs] > >[ 2039.225821] ? cifs_sb_master_tcon+0x23/0x30 [cifs] > >[ 2039.225970] cifs_unlink+0x21d/0x7b0 [cifs] > >[ 2039.226121] vfs_unlink+0x1bf/0x4a0 > >[ 2039.226131] ? lookup_one_qstr_excl+0x24/0xd0 > >[ 2039.226143] do_unlinkat+0x380/0x450 > >[ 2039.226154] ? __pfx_do_unlinkat+0x10/0x10 > >[ 2039.226161] ? kasan_save_track+0x14/0x30 > >[ 2039.226171] ? rcu_is_watching+0x20/0x50 > >[ 2039.226180] ? strncpy_from_user+0x126/0x180 > >[ 2039.226201] __x64_sys_unlinkat+0x5e/0xa0 > >[ 2039.226211] do_syscall_64+0x75/0x180 > >[ 2039.226222] entry_SYSCALL_64_after_hwframe+0x76/0x7e > >[ 2039.226230] RIP: 0033:0x7fdc92d7784b > >[ 2039.226237] Code: 77 05 c3 0f 1f 40 00 48 8b 15 c9 85 0d 00 f7 d8 > >64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 07 01 00 > >00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 85 0d 00 f7 d8 64 89 > >01 48 > >[ 2039.226244] RSP: 002b:00007ffd51debe58 EFLAGS: 00000206 ORIG_RAX: > >0000000000000107 > >[ 2039.226253] RAX: ffffffffffffffda RBX: 0000556c6a58d5a0 RCX: 00007fdc92d7784b > >[ 2039.226258] RDX: 0000000000000000 RSI: 0000556c6a58c380 RDI: 00000000ffffff9c > >[ 2039.226263] RBP: 00007ffd51debf20 R08: 0000556c6a58c380 R09: 00007ffd51debf7c > >[ 2039.226268] R10: 0000556c6a58d7d0 R11: 0000000000000206 R12: 0000556c6a58c2f0 > >[ 2039.226273] R13: 0000000000000000 R14: 00007ffd51debf80 R15: 0000000000000000 > >[ 2039.226297] </TASK> > >[ 2039.226301] irq event stamp: 5933 > >[ 2039.226305] hardirqs last enabled at (5939): [<ffffffffa122e2de>] > >__up_console_sem+0x5e/0x70 > >[ 2039.226314] hardirqs last disabled at (5944): [<ffffffffa122e2c3>] > >__up_console_sem+0x43/0x70 > >[ 2039.226320] softirqs last enabled at (3096): [<ffffffffa11340ce>] > >__irq_exit_rcu+0xfe/0x120 > >[ 2039.226327] softirqs last disabled at (2941): [<ffffffffa11340ce>] > >__irq_exit_rcu+0xfe/0x120 > >[ 2039.226333] ---[ end trace 0000000000000000 ]--- > >[ 2040.931627] VFS: Busy inodes after unmount of cifs (cifs) > >[ 2040.931693] ------------[ cut here ]------------ > >[ 2040.931699] kernel BUG at fs/super.c:650! > >[ 2040.932770] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > >[ 2040.932846] CPU: 1 UID: 0 PID: 123637 Comm: rm Tainted: G W > > 6.12.0-rc6 #1 > >[ 2040.932936] Tainted: [W]=WARN > >[ 2040.932969] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014 > >[ 2040.933028] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.933086] Code: 7b 28 e8 cc ca f8 ff 48 8b 6b 28 48 89 ef e8 c0 > >ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 60 14 ba a2 e8 d9 > >6c b6 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 > >90 90 > >[ 2040.933252] RSP: 0018:ff11000128bcfa38 EFLAGS: 00010282 > >[ 2040.933305] RAX: 000000000000002d RBX: ff1100013fe0c000 RCX: ffffffffa12c915e > >[ 2040.933370] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ff110004cb0b7080 > >[ 2040.933436] RBP: ffffffffc20529a0 R08: 0000000000000001 R09: ffe21c0025179f0f > >[ 2040.933500] R10: ff11000128bcf87f R11: 0000000000000001 R12: ff1100013fe0c9c0 > >[ 2040.933565] R13: ff1100013fe0c780 R14: ff11000168113378 R15: ffffffffc2094200 > >[ 2040.933630] FS: 00007fdc92c66740(0000) GS:ff110004cb080000(0000) > >knlGS:0000000000000000 > >[ 2040.933703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >[ 2040.933757] CR2: 0000556c6a58d488 CR3: 00000001229ec006 CR4: 0000000000373ef0 > >[ 2040.933828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >[ 2040.933905] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >[ 2040.933972] Call Trace: > >[ 2040.934000] <TASK> > >[ 2040.934024] ? die+0x37/0x90 > >[ 2040.934058] ? do_trap+0x133/0x230 > >[ 2040.934094] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934144] ? do_error_trap+0x94/0x130 > >[ 2040.934182] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934229] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934277] ? handle_invalid_op+0x2c/0x40 > >[ 2040.934318] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934364] ? exc_invalid_op+0x2f/0x50 > >[ 2040.934405] ? asm_exc_invalid_op+0x1a/0x20 > >[ 2040.934453] ? tick_nohz_tick_stopped+0x1e/0x40 > >[ 2040.934502] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934551] ? generic_shutdown_super+0x1b7/0x1c0 > >[ 2040.934600] kill_anon_super+0x22/0x40 > >[ 2040.934641] cifs_kill_sb+0x78/0x90 [cifs] > >[ 2040.934851] deactivate_locked_super+0x69/0xf0 > >[ 2040.934915] cifsFileInfo_put_final+0x25a/0x290 [cifs] > >[ 2040.935125] _cifsFileInfo_put+0x5f9/0x770 [cifs] > >[ 2040.935331] ? __pfx__cifsFileInfo_put+0x10/0x10 [cifs] > >[ 2040.935537] ? mark_held_locks+0x65/0x90 > >[ 2040.935581] ? kfree+0x1e5/0x490 > >[ 2040.937096] ? cifs_close_deferred_file_under_dentry+0x2e5/0x350 [cifs] > >[ 2040.938778] cifs_close_deferred_file_under_dentry+0x26d/0x350 [cifs] > >[ 2040.940478] ? __pfx_cifs_close_deferred_file_under_dentry+0x10/0x10 [cifs] > >[ 2040.942193] ? cifs_sb_master_tcon+0x23/0x30 [cifs] > >[ 2040.943865] cifs_unlink+0x21d/0x7b0 [cifs] > >[ 2040.945527] vfs_unlink+0x1bf/0x4a0 > >[ 2040.947008] ? lookup_one_qstr_excl+0x24/0xd0 > >[ 2040.948451] do_unlinkat+0x380/0x450 > >[ 2040.949889] ? __pfx_do_unlinkat+0x10/0x10 > >[ 2040.951306] ? kasan_save_track+0x14/0x30 > >[ 2040.952708] ? rcu_is_watching+0x20/0x50 > >[ 2040.954135] ? strncpy_from_user+0x126/0x180 > >[ 2040.955494] __x64_sys_unlinkat+0x5e/0xa0 > >[ 2040.956779] do_syscall_64+0x75/0x180 > >[ 2040.958053] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > > > >but I did see a crash in generic/246 with current mainline + the five > >directory lease patches (see below). Any ideas? > > > >[ 4315.917270] CIFS: Attempting to mount //win16.vm.test/Share > >[ 4315.928088] ================================================================== > >[ 4315.930402] BUG: KASAN: slab-use-after-free in > >rwsem_down_write_slowpath+0xa34/0xaf0 > >[ 4315.932655] Read of size 4 at addr ff1100012ad38034 by task mount.cifs/340968 > > > >[ 4315.937119] CPU: 2 UID: 0 PID: 340968 Comm: mount.cifs Tainted: G > > D W 6.12.0-rc6 #1 > >[ 4315.939454] Tainted: [D]=DIE, [W]=WARN > >[ 4315.941747] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014 > >[ 4315.944094] Call Trace: > >[ 4315.946425] <TASK> > >[ 4315.948724] dump_stack_lvl+0x79/0xb0 > >[ 4315.951046] print_report+0xcb/0x620 > >[ 4315.953359] ? __virt_addr_valid+0x19a/0x300 > >[ 4315.955682] ? rwsem_down_write_slowpath+0xa34/0xaf0 > >[ 4315.957994] kasan_report+0xbd/0xf0 > >[ 4315.960317] ? rwsem_down_write_slowpath+0xa34/0xaf0 > >[ 4315.962610] rwsem_down_write_slowpath+0xa34/0xaf0 > >[ 4315.964835] ? kasan_save_stack+0x34/0x50 > >[ 4315.967030] ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > >[ 4315.969221] ? cifs_mount+0xfb/0x3b0 [cifs] > >[ 4315.971586] ? cifs_smb3_do_mount+0x1a5/0xc10 [cifs] > >[ 4315.974108] ? smb3_get_tree+0x1f0/0x430 [cifs] > >[ 4315.976511] ? rcu_is_watching+0x20/0x50 > >[ 4315.978712] ? trace_lock_acquire+0x116/0x150 > >[ 4315.980884] ? lock_acquire+0x40/0x90 > >[ 4315.982989] ? super_lock+0xea/0x1d0 > >[ 4315.985036] ? super_lock+0xea/0x1d0 > >[ 4315.986997] down_write+0x15b/0x160 > >[ 4315.988893] ? __pfx_down_write+0x10/0x10 > >[ 4315.990751] ? __mod_timer+0x407/0x590 > >[ 4315.992531] super_lock+0xea/0x1d0 > >[ 4315.994231] ? __pfx_super_lock+0x10/0x10 > >[ 4315.995891] ? __pfx_lock_release+0x10/0x10 > >[ 4315.997528] ? rcu_is_watching+0x20/0x50 > >[ 4315.999137] ? lock_release+0xa5/0x3d0 > >[ 4316.000705] ? cifs_match_super+0x177/0x650 [cifs] > >[ 4316.002385] grab_super+0x80/0x1e0 > >[ 4316.003884] ? __pfx_grab_super+0x10/0x10 > >[ 4316.005351] ? cifs_put_tlink+0xa1/0xc0 [cifs] > >[ 4316.006990] ? cifs_match_super+0x17f/0x650 [cifs] > >[ 4316.008620] ? __pfx_cifs_match_super+0x10/0x10 [cifs] > >[ 4316.010236] sget+0x121/0x350 > >[ 4316.011694] ? __pfx_cifs_set_super+0x10/0x10 [cifs] > >[ 4316.013293] cifs_smb3_do_mount+0x293/0xc10 [cifs] > >[ 4316.014933] ? __pfx___mutex_lock+0x10/0x10 > >[ 4316.016333] ? cred_has_capability.isra.0+0xd4/0x1a0 > >[ 4316.017752] ? __pfx_cifs_smb3_do_mount+0x10/0x10 [cifs] > >[ 4316.019338] smb3_get_tree+0x1f0/0x430 [cifs] > >[ 4316.020910] vfs_get_tree+0x50/0x180 > >[ 4316.022292] path_mount+0x5d6/0xf20 > >[ 4316.023668] ? __pfx_path_mount+0x10/0x10 > >[ 4316.024983] ? user_path_at+0x45/0x60 > >[ 4316.026275] __x64_sys_mount+0x174/0x1b0 > >[ 4316.027552] ? __pfx___x64_sys_mount+0x10/0x10 > >[ 4316.028794] ? rcu_is_watching+0x20/0x50 > >[ 4316.030021] do_syscall_64+0x75/0x180 > >[ 4316.031226] entry_SYSCALL_64_after_hwframe+0x76/0x7e > >[ 4316.032476] RIP: 0033:0x7f979fa7b8fe > >[ 4316.033742] Code: 48 8b 0d 1d a5 0c 00 f7 d8 64 89 01 48 83 c8 ff > >c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 > >00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ea a4 0c 00 f7 d8 64 89 > >01 48 > >[ 4316.036462] RSP: 002b:00007fffbaa4f598 EFLAGS: 00000246 ORIG_RAX: > >00000000000000a5 > >[ 4316.037874] RAX: ffffffffffffffda RBX: 0000557684547471 RCX: 00007f979fa7b8fe > >[ 4316.039284] RDX: 0000557684547471 RSI: 00005576845474d7 RDI: 00007fffbaa50caf > >[ 4316.040701] RBP: 000000000000000a R08: 00005576847deeb0 R09: 0000000000000000 > >[ 4316.042106] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffbaa50caf > >[ 4316.043522] R13: 00005576847dff20 R14: 00005576847deeb0 R15: 00007f979fb75000 > >[ 4316.044928] </TASK> > > > >[ 4316.047670] Allocated by task 340820: > >[ 4316.049036] kasan_save_stack+0x24/0x50 > >[ 4316.050414] kasan_save_track+0x14/0x30 > >[ 4316.051771] __kasan_slab_alloc+0x59/0x70 > >[ 4316.053139] kmem_cache_alloc_node_noprof+0x116/0x330 > >[ 4316.054529] copy_process+0x299/0x45e0 > >[ 4316.055915] kernel_clone+0xf2/0x4b0 > >[ 4316.057299] __do_sys_clone+0x90/0xb0 > >[ 4316.058708] do_syscall_64+0x75/0x180 > >[ 4316.060095] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > >[ 4316.062893] Freed by task 0: > >[ 4316.064283] kasan_save_stack+0x24/0x50 > >[ 4316.065695] kasan_save_track+0x14/0x30 > >[ 4316.067076] kasan_save_free_info+0x3b/0x60 > >[ 4316.068462] __kasan_slab_free+0x38/0x50 > >[ 4316.069841] kmem_cache_free+0x239/0x5a0 > >[ 4316.071213] delayed_put_task_struct+0x149/0x1b0 > >[ 4316.072608] rcu_do_batch+0x2f4/0x880 > >[ 4316.073979] rcu_core+0x3d6/0x510 > >[ 4316.075340] handle_softirqs+0x10f/0x580 > >[ 4316.076708] __irq_exit_rcu+0xfe/0x120 > >[ 4316.078052] irq_exit_rcu+0xe/0x20 > >[ 4316.079400] sysvec_apic_timer_interrupt+0x76/0x90 > >[ 4316.080747] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > > >[ 4316.083488] Last potentially related work creation: > >[ 4316.084845] kasan_save_stack+0x24/0x50 > >[ 4316.086188] __kasan_record_aux_stack+0x8e/0xa0 > >[ 4316.087564] __call_rcu_common.constprop.0+0x87/0x920 > >[ 4316.088932] release_task+0x836/0xc60 > >[ 4316.090296] wait_consider_task+0x9db/0x19c0 > >[ 4316.091682] __do_wait+0xe9/0x390 > >[ 4316.093046] do_wait+0xcb/0x230 > >[ 4316.094412] kernel_wait4+0xe4/0x1a0 > >[ 4316.095767] __do_sys_wait4+0xce/0xe0 > >[ 4316.097110] do_syscall_64+0x75/0x180 > >[ 4316.098455] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > >[ 4316.100994] The buggy address belongs to the object at ff1100012ad38000 > > which belongs to the cache task_struct of size 14656 > >[ 4316.103497] The buggy address is located 52 bytes inside of > > freed 14656-byte region [ff1100012ad38000, ff1100012ad3b940) > > > >[ 4316.107248] The buggy address belongs to the physical page: > >[ 4316.108537] page: refcount:1 mapcount:0 mapping:0000000000000000 > >index:0x0 pfn:0x12ad38 > >[ 4316.109846] head: order:3 mapcount:0 entire_mapcount:0 > >nr_pages_mapped:0 pincount:0 > >[ 4316.111193] memcg:ff11000100d2c6c1 > >[ 4316.112543] anon flags: > >0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) > >[ 4316.113945] page_type: f5(slab) > >[ 4316.115364] raw: 0017ffffc0000040 ff11000100280a00 0000000000000000 > >0000000000000001 > >[ 4316.116852] raw: 0000000000000000 0000000000020002 00000001f5000000 > >ff11000100d2c6c1 > >[ 4316.118329] head: 0017ffffc0000040 ff11000100280a00 > >0000000000000000 0000000000000001 > >[ 4316.119818] head: 0000000000000000 0000000000020002 > >00000001f5000000 ff11000100d2c6c1 > >[ 4316.121302] head: 0017ffffc0000003 ffd4000004ab4e01 > >ffffffffffffffff 0000000000000000 > >[ 4316.122818] head: 0000000000000008 0000000000000000 > >00000000ffffffff 0000000000000000 > >[ 4316.124323] page dumped because: kasan: bad access detected > > > >[ 4316.127318] Memory state around the buggy address: > >[ 4316.128847] ff1100012ad37f00: ff ff ff ff ff ff ff ff ff ff ff ff > >ff ff ff ff > >[ 4316.130427] ff1100012ad37f80: ff ff ff ff ff ff ff ff ff ff ff ff > >ff ff ff ff > >[ 4316.132006] >ff1100012ad38000: fa fb fb fb fb fb fb fb fb fb fb fb > >fb fb fb fb > >[ 4316.133594] ^ > >[ 4316.135170] ff1100012ad38080: fb fb fb fb fb fb fb fb fb fb fb fb > >fb fb fb fb > >[ 4316.136796] ff1100012ad38100: fb fb fb fb fb fb fb fb fb fb fb fb > >fb fb fb fb > >[ 4316.138399] ================================================================== > > > > > >On Fri, Nov 8, 2024 at 5:12 PM Enzo Matsumiya <ematsumiya@xxxxxxx> wrote: > >> > >> On 11/08, Paul Aurich wrote: > >> >No, this series doesn't try to address that. I was just focused on the > >> >issues around the lifecycle of the cfids and the dentry:s. > >> > >> I'll be reviewing the patches next week, but for now I can say +1. > >> > >> We've been debugging this issue for the past month or so, and it's been > >> quite hard to understand/debug who was racing who. > >> > >> The 'dentry still in use' bug seems to appear only for the root dentry, > >> whereas cached_fids for subdirectories sometimes can get duplicates, and > >> thus one dangling, where in the end an underflow + double list_del() > >> happens to it, e.g.: > >> > >> (this is coming from cifs_readdir()) > >> crash> cached_fid.entry,cfids,path,has_lease,is_open,on_list,time,tcon,refcount 0xffff892f4b5b3e00 > >> entry = { > >> next = 0xffff892e053ed400, > >> prev = 0xffff892d8e3ecb88 > >> }, > >> cfids = 0xffff892d8e3ecb80, > >> path = 0xffff892f48b7b3b0 "\\dir1\\dir2\\dir3", > >> has_lease = 0x0, > >> is_open = 0x0, > >> on_list = 0x1, > >> time = 0x0, > >> tcon = 0x0, > >> refcount = { ... counter = 0x2 ... } > >> > >> (this is at the crashing frame on smb2_close_cached_fid()) > >> crash> cached_fid.entry,cfids,path,has_lease,is_open,on_list,time,tcon,refcount ffff892e053ee000 > >> entry = { > >> next = 0xdead000000000100, > >> prev = 0xdead000000000122 > >> }, > >> cfids = 0xffff892d8e3ecb80, > >> path = 0xffff892f825ced30 "\\dir1\\dir2\\dir3", > >> has_lease = 0x0, > >> is_open = 0x1, > >> on_list = 0x1, > >> time = 0x1040998fc, > >> tcon = 0xffff892fe0b4d000, > >> refcount = { ... counter = 0x0 ... } > >> > >> You can see that the second one had already been deleted (refcount=0, > >> ->entry is poisoned), but the first one hasn't been filled in by > >> open_cached_dir() yet, but already has 2 references to it. > >> > >> A reliable reproducer I found for this was running xfstests '-g > >> generic/dir' in one terminal, and generic/072 some seconds later. > >> > >> (in the beginning I thought that a reconnect was required to trigger > >> this bug, but any kind of interruption (I could trigger it with ctrl-c > >> mid-xfstests a few times) should trigger it) > >> > >> actimeo >= 0 seems to play a role as well, because things can get quite > >> complicated (unsure if problematic though) with a callchain such as: > >> > >> open_cached_dir > >> -> path_to_dentry > >> -> lookup_positive_unlocked > >> -> lookup_dcache > >> -> cifs_d_revalidate (dentry->d_op->d_revalidate) > >> -> cifs_revalidate_dentry > >> -> cifs_revalidate_dentry_attr > >> -> cifs_dentry_needs_reval > >> -> open_cached_dir_by_dentry > >> > >> > >> Anyway, thanks a lot for you patches, Paul. Like I said, I'll be > >> testing + reviewing them soon. > >> > >> > >> Cheers, > >> > >> Enzo > >> > >> >On 2024-11-08 16:39:03 -0600, Steve French wrote: > >> >>The perf improvement allowing caching of directories (helping "ls" > >> >>then "ls" again for same dir) not just the perf improvement with "ls > >> >>"then "stat" could be very important. > >> >> > >> >>Did this series also address Ralph's point about missing requesting > >> >>"H" (handle caching) flag when requesting directory leases from the > >> >>server? > >> >> > >> >>On Fri, Nov 8, 2024 at 4:35 PM Paul Aurich <paul@xxxxxxxxxxxxxx> wrote: > >> >>> > >> >>>The SMB client cached directory functionality has a few problems around > >> >>>flaky/lost server connections, which manifest as a pair of BUGs when > >> >>>eventually unmounting the server connection: > >> >>> > >> >>> [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] > >> >>> [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) > >> >>> > >> >>>Based on bisection, these issues started with the lease directory cache > >> >>>handling introduced in commit ebe98f1447bb ("cifs: enable caching of > >> >>>directories for which a lease is held"), and go away if I mount with > >> >>>'nohandlecache'. I started seeing these on Debian Bookworm stable kernel > >> >>>(v6.1.x), but the issues persist even in current git versions. I think the > >> >>>situation was improved (occurrence frequency went down) with > >> >>>commit 5c86919455c1 ("smb: client: fix use-after-free in > >> >>>smb2_query_info_compound()"). > >> >>> > >> >>> > >> >>>I'm able to reproduce the "Dentry still in use" errors by connecting to an > >> >>>actively-used SMB share (the server organically generates lease breaks) and > >> >>>leaving these running for 'a while': > >> >>> > >> >>>- while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done > >> >>>- while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done > >> >>> > >> >>>('a while' is anywhere from 10 minutes to overnight. Also, it's not the > >> >>>cleanest reproducer, but I stopped iterating once I had something that was > >> >>>even remotely reliable for me...) > >> >>> > >> >>>This series attempts to fix these, as well as a use-after-free that could > >> >>>occur because open_cached_dir() explicitly frees the cached_fid, rather than > >> >>>relying on reference counting. > >> >>> > >> >>> > >> >>>The last patch in this series (smb: During umount, flush any pending lease > >> >>>breaks for cached dirs) is a work-in-progress, and should not be taken as-is. > >> >>>The issue there: > >> >>> > >> >>>Unmounting an SMB share (cifs_kill_sb) can race with a queued lease break from > >> >>>the server for a cached directory. When this happens, the cfid is removed > >> >>>from the linked list, so close_all_cached_dirs() cannot drop the dentry. If > >> >>>cifs_kill_sb continues on before the queued work puts the dentry, we trigger > >> >>>the "Dentry still in use" BUG splat. Flushing the cifsiod_wq seems to help > >> >>>this, but some thoughts: > >> >>> > >> >>>1. cifsiod_wq is a global workqueue, rather than per-mount. Flushing the > >> >>> entire workqueue is potentially doing more work that necessary. Should > >> >>> there be a workqueue that's more appropriately scoped? > >> >>>2. With an unresponsive server, this causes umount (even umount -l) to hang > >> >>> (waiting for SMB2_close calls), and when I test with backports on a 6.1 > >> >>> kernel, appears to cause a deadlock between kill_sb and some cifs > >> >>> reconnection code calling iterate_supers_type. (Pretty sure the deadlock > >> >>> was addressed by changes to fs/super.c, so not really an SMB problem, but > >> >>> just an indication that flush_waitqueue isn't the right solution). > >> >>>3. Should cached_dir_lease_break() drop the dentry before queueing work > >> >>> (and if so, is it OK to do this under the spinlock, or should the spinlock > >> >>> be dropped first)? > >> >>>4. Related to #3 -- shouldn't close_all_cached_dirs() be holding the spinlock > >> >>> while looping? > >> >>> > >> >>> > >> >>>Lastly, patches 2, 3, and 5 (in its final form) are beneficial going back to > >> >>>v6.1 for stable, but it's not a clean backport because some other important > >> >>>fixes (commit 5c86919455c1 ("smb: client: fix use-after-free in > >> >>>smb2_query_info_compound()") weren't picked up. > >> >>> > >> >>>Paul Aurich (5): > >> >>> smb: cached directories can be more than root file handle > >> >>> smb: Don't leak cfid when reconnect races with open_cached_dir > >> >>> smb: prevent use-after-free due to open_cached_dir error paths > >> >>> smb: No need to wait for work when cleaning up cached directories > >> >>> smb: During umount, flush any pending lease breaks for cached dirs > >> >>> > >> >>> fs/smb/client/cached_dir.c | 106 ++++++++++++++++--------------------- > >> >>> 1 file changed, 47 insertions(+), 59 deletions(-) > >> >>> > >> >>>-- > >> >>>2.45.2 > >> >>> > >> >>> > >> >> > >> >> > >> >>-- > >> >>Thanks, > >> >> > >> >>Steve > >> > > >> > > > > > > > > >-- > >Thanks, > > > >Steve > > > -- Thanks, Steve