On Mon, Oct 31, 2016 at 02:32:56PM -0400, Mike Marshall wrote: > Hello everyone. [adding Nicolai to thread...] > I wrote the Orangefs debugfs code. Recently my coworker > Martin refactored it to clean up the cut-and-pastey parts > I had put in. The refactor seemed to trigger dan.carpenter@xxxxxxxxxx's > static tester to find a possible double-free in the code. > > I think the possible-double-free will be easy to fix, but > while in there, I'm looking for other "bad places". > > Our debugfs code results in three files in /sys/kernel/debug/orangefs. > One of the files gets deleted (debugfs_remove'd) and re-created > (debugfs_create_file'd) the first time someone fires up the > user-space part of Orangefs after a reboot. > > We wondered what awful things might happen if someone was > reading the file across the delete/re-create, so I wrote a > program that opens the file, sleeps ten seconds and then > starts reading, and I fired up the Orangefs userspace part > during the sleep. I didn't see any problems there, we get > EIO when the read happens. > > But... really bad things happen if someone unloads the Orangefs > module after my test program does the open and before the read > starts. So I picked another debugfs-using-filesystem (f2fs) and > pointed my tester-program at /sys/kernel/debug/f2fs/status, and > the same bad thing happens there. > > I was hoping that f2fs, or some other debugfs-using-filesystem, would be > able to handle my rmmod test and then I could look at their code for > inspiration, but no such luck so far. Is there something that me and the > f2fs guys aren't doing right or is this just something about debugfs > that's fragile? debugfs, before 4.8, used to be very fragile with this very problem, but 4.8 should have resolved this with Nicolai's patches. > [ 1240.133703] BUG: unable to handle kernel paging request at ffffffffa0307430 > [ 1240.134109] IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 > [ 1240.134434] PGD 1c0f067 [ 1240.134560] PUD 1c10063 > PMD 3c8d0067 [ 1240.134793] PTE 0 > [ 1240.134905] > [ 1240.134988] Oops: 0000 [#1] > [ 1240.135137] Modules linked in: ip6t_rpfilter bnep ip6t_REJECT > nf_reject_ipv6 bluetooth rfkill nf_conntrack_ipv6 nf_defrag_ipv6 > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat > ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle > ip6table_security ip6table_raw ip6table_filter ip6_tables > iptable_mangle iptable_security iptable_raw ppdev parport_pc parport > 8139too serio_raw i2c_piix4 virtio_balloon virtio_console pvpanic > uinput qxl drm_kms_helper ttm drm virtio_pci 8139cp i2c_core > ata_generic virtio virtio_ring mii pata_acpi [last unloaded: f2fs] > [ 1240.138209] CPU: 0 PID: 1178 Comm: dhs Not tainted > 4.9.0-rc1-00002-g804b173-dirty #3 > [ 1240.138605] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 1240.138968] task: ffff88003e166040 task.stack: ffffc900006d4000 > [ 1240.139275] RIP: 0010:[<ffffffff8132a224>] [<ffffffff8132a224>] > full_proxy_release+0x24/0x90 > [ 1240.139721] RSP: 0018:ffffc900006d7db8 EFLAGS: 00010286 > [ 1240.140002] RAX: ffffffff8132a200 RBX: ffff88001fc3fa80 RCX: 0000000000000000 > [ 1240.140369] RDX: ffff88001fc3fc08 RSI: ffff88001fc3fa80 RDI: ffff880015097bc0 > [ 1240.140749] RBP: ffffc900006d7de0 R08: 0000000000000000 R09: 0000000000000000 > [ 1240.141126] R10: ffff880015097bc0 R11: ffff88001fc3fa90 R12: ffffffffa03073c0 > [ 1240.141494] R13: ffff88001506a7e0 R14: ffff88003ab0e300 R15: ffff88001506a7e0 > [ 1240.141864] FS: 0000000000000000(0000) GS:ffffffff81c39000(0000) > knlGS:0000000000000000 > [ 1240.142279] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1240.142577] CR2: ffffffffa0307430 CR3: 000000001fd97000 CR4: 00000000000006f0 > [ 1240.142968] Stack: > [ 1240.143078] ffff88001fc3fa80 0000000000000010 ffff880015097bc0 > ffff8800369d68e0 > [ 1240.143490] ffff88001506a7e0 ffffc900006d7e28 ffffffff8122907f > ffff880015097bc0 > [ 1240.143904] ffff88001fc3fa90 ffff88003e166568 ffffffff81f09330 > ffff88001fc3f540 > [ 1240.144316] Call Trace: > [ 1240.144450] [<ffffffff8122907f>] __fput+0xdf/0x1d0 > [ 1240.144704] [<ffffffff812291ae>] ____fput+0xe/0x10 > [ 1240.144962] [<ffffffff810b97de>] task_work_run+0x8e/0xc0 > [ 1240.145243] [<ffffffff8109b98e>] do_exit+0x2ae/0xae0 > [ 1240.145507] [<ffffffff8113927e>] ? __audit_syscall_entry+0xae/0x100 > [ 1240.145840] [<ffffffff810034da>] ? syscall_trace_enter+0x1ca/0x310 > [ 1240.146164] [<ffffffff8109c244>] do_group_exit+0x44/0xc0 > [ 1240.146445] [<ffffffff8109c2d4>] SyS_exit_group+0x14/0x20 > [ 1240.146742] [<ffffffff81003a61>] do_syscall_64+0x61/0x150 > [ 1240.147049] [<ffffffff817f1fc4>] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1240.147391] Code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 > 41 57 41 56 4c 8b 76 28 41 55 4c 8b 6e 18 41 54 53 4d 8b a5 d8 00 00 > 00 48 89 f3 <49> 8b 44 24 70 48 85 c0 74 4e ff d0 41 89 c7 48 8b 43 28 > 48 85 > [ 1240.148919] RIP [<ffffffff8132a224>] full_proxy_release+0x24/0x90 > [ 1240.149248] RSP <ffffc900006d7db8> > [ 1240.149432] CR2: ffffffffa0307430 > [ 1240.149609] ---[ end trace f22ae883fa3ea6b8 ]--- > [ 1240.149922] Fixing recursive fault but reboot is needed! Nicolai, any thoughts here? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html