Re: debugfs question...

Greg KH <greg@xxxxxxxxx> · Mon, 31 Oct 2016 13:38:23 -0600

On Mon, Oct 31, 2016 at 02:32:56PM -0400, Mike Marshall wrote:
> Hello everyone.

[adding Nicolai to thread...]

> I wrote the Orangefs debugfs code. Recently my coworker
> Martin refactored it to clean up the cut-and-pastey parts
> I had put in. The refactor seemed to trigger dan.carpenter@xxxxxxxxxx's
> static tester to find a possible double-free in the code.
> 
> I think the possible-double-free will be easy to fix, but
> while in there, I'm looking for other "bad places".
> 
> Our debugfs code results in three files in /sys/kernel/debug/orangefs.
> One of the files gets deleted (debugfs_remove'd) and re-created
> (debugfs_create_file'd) the first time someone fires up the
> user-space part of Orangefs after a reboot.
> 
> We wondered what awful things might happen if someone was
> reading the file across the delete/re-create, so I wrote a
> program that opens the file, sleeps ten seconds and then
> starts reading, and I fired up the Orangefs userspace part
> during the sleep. I didn't see any problems there, we get
> EIO when the read happens.
> 
> But... really bad things happen if someone unloads the Orangefs
> module after my test program does the open and before the read
> starts. So I picked another debugfs-using-filesystem (f2fs) and
> pointed my tester-program at /sys/kernel/debug/f2fs/status, and
> the same bad thing happens there.
> 
> I was hoping that f2fs, or some other debugfs-using-filesystem, would be
> able to handle my rmmod test and then I could look at their code for
> inspiration, but no such luck so far. Is there something that me and the
> f2fs guys aren't doing right or is this just something about debugfs
> that's fragile?

debugfs, before 4.8, used to be very fragile with this very problem, but
4.8 should have resolved this with Nicolai's patches.

> [ 1240.133703] BUG: unable to handle kernel paging request at ffffffffa0307430
> [ 1240.134109] IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
> [ 1240.134434] PGD 1c0f067 [ 1240.134560] PUD 1c10063
> PMD 3c8d0067 [ 1240.134793] PTE 0
> [ 1240.134905]
> [ 1240.134988] Oops: 0000 [#1]
> [ 1240.135137] Modules linked in: ip6t_rpfilter bnep ip6t_REJECT
> nf_reject_ipv6 bluetooth rfkill nf_conntrack_ipv6 nf_defrag_ipv6
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat
> ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle
> ip6table_security ip6table_raw ip6table_filter ip6_tables
> iptable_mangle iptable_security iptable_raw ppdev parport_pc parport
> 8139too serio_raw i2c_piix4 virtio_balloon virtio_console pvpanic
> uinput qxl drm_kms_helper ttm drm virtio_pci 8139cp i2c_core
> ata_generic virtio virtio_ring mii pata_acpi [last unloaded: f2fs]
> [ 1240.138209] CPU: 0 PID: 1178 Comm: dhs Not tainted
> 4.9.0-rc1-00002-g804b173-dirty #3
> [ 1240.138605] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 1240.138968] task: ffff88003e166040 task.stack: ffffc900006d4000
> [ 1240.139275] RIP: 0010:[<ffffffff8132a224>]  [<ffffffff8132a224>]
> full_proxy_release+0x24/0x90
> [ 1240.139721] RSP: 0018:ffffc900006d7db8  EFLAGS: 00010286
> [ 1240.140002] RAX: ffffffff8132a200 RBX: ffff88001fc3fa80 RCX: 0000000000000000
> [ 1240.140369] RDX: ffff88001fc3fc08 RSI: ffff88001fc3fa80 RDI: ffff880015097bc0
> [ 1240.140749] RBP: ffffc900006d7de0 R08: 0000000000000000 R09: 0000000000000000
> [ 1240.141126] R10: ffff880015097bc0 R11: ffff88001fc3fa90 R12: ffffffffa03073c0
> [ 1240.141494] R13: ffff88001506a7e0 R14: ffff88003ab0e300 R15: ffff88001506a7e0
> [ 1240.141864] FS:  0000000000000000(0000) GS:ffffffff81c39000(0000)
> knlGS:0000000000000000
> [ 1240.142279] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1240.142577] CR2: ffffffffa0307430 CR3: 000000001fd97000 CR4: 00000000000006f0
> [ 1240.142968] Stack:
> [ 1240.143078]  ffff88001fc3fa80 0000000000000010 ffff880015097bc0
> ffff8800369d68e0
> [ 1240.143490]  ffff88001506a7e0 ffffc900006d7e28 ffffffff8122907f
> ffff880015097bc0
> [ 1240.143904]  ffff88001fc3fa90 ffff88003e166568 ffffffff81f09330
> ffff88001fc3f540
> [ 1240.144316] Call Trace:
> [ 1240.144450]  [<ffffffff8122907f>] __fput+0xdf/0x1d0
> [ 1240.144704]  [<ffffffff812291ae>] ____fput+0xe/0x10
> [ 1240.144962]  [<ffffffff810b97de>] task_work_run+0x8e/0xc0
> [ 1240.145243]  [<ffffffff8109b98e>] do_exit+0x2ae/0xae0
> [ 1240.145507]  [<ffffffff8113927e>] ? __audit_syscall_entry+0xae/0x100
> [ 1240.145840]  [<ffffffff810034da>] ? syscall_trace_enter+0x1ca/0x310
> [ 1240.146164]  [<ffffffff8109c244>] do_group_exit+0x44/0xc0
> [ 1240.146445]  [<ffffffff8109c2d4>] SyS_exit_group+0x14/0x20
> [ 1240.146742]  [<ffffffff81003a61>] do_syscall_64+0x61/0x150
> [ 1240.147049]  [<ffffffff817f1fc4>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 1240.147391] Code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5
> 41 57 41 56 4c 8b 76 28 41 55 4c 8b 6e 18 41 54 53 4d 8b a5 d8 00 00
> 00 48 89 f3 <49> 8b 44 24 70 48 85 c0 74 4e ff d0 41 89 c7 48 8b 43 28
> 48 85
> [ 1240.148919] RIP  [<ffffffff8132a224>] full_proxy_release+0x24/0x90
> [ 1240.149248]  RSP <ffffc900006d7db8>
> [ 1240.149432] CR2: ffffffffa0307430
> [ 1240.149609] ---[ end trace f22ae883fa3ea6b8 ]---
> [ 1240.149922] Fixing recursive fault but reboot is needed!

Nicolai, any thoughts here?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html