Re: automount/kernel crashes

Nick Piggin <npiggin@xxxxxxxxx> · Fri, 20 Apr 2012 21:38:23 +1000

On Mon, Apr 16, 2012 at 10:24:54AM +0800, Ian Kent wrote:
> On Sun, 2012-04-15 at 14:05 -0700, Jan Sanislo wrote:
> > We are seeing occasional (approx. weekly) automount/kernel crashes using
> > kernel version 3.1.7 and autofs version 5.0.5-39.  The log files show
> > the following traceback:
> 
> Nick,
> 
> Can you have a look at my fs/autofs4/expire.c:get_next_positive_subdir()
> function please.
> 
> It looks like my assignment of "p = q" in the "if (!simple_positive(q))
> {}" block is incorrect. My thinking is that if q goes goes away while
> waiting on the d_lock then it will have been removed from the child list
> so I should just "goto again" with p as is. q itself will not actually
> be freed until function exit since the autofs sbi->lookup_lock will
> block in ->d_release(). Can you see any other problem with it and is
> there a similar problem with
> fs/autofs4/expire.c:get_next_positive_dentry()?

Hi Ian,

Firstly, what's the lock ordering on your d_lock of the dentries?
Do you ensure that the vfs never locks two dentries at once, and
you have your own lock order?

Secondly, it seems like d_release won't be called until after the
dentry has been removed from the d_child list. Couldn't that cause
a corruption here?

Thanks,
Nick

> 
> > 
> > ===========================
> > 
> > kernel: general protection fault: 0000 [#1] SMP 
> > kernel: CPU 1 
> > kernel: Modules linked in: binfmt_misc xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables nfsd dm_snapshot dm_mirror dm_region_hash dm_log sg bnx2 rng_core ipv6 ext4 jbd2 crc16 usbhid sd_mod sr_mod cdrom ata_piix libata megaraid_sas ehci_hcd uhci_hcd scsi_mod button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect dm_mod [last unloaded: scsi_wait_scan]
> > kernel: 
> > kernel: Pid: 12716, comm: automount Not tainted 3.1.7-0cse.1 #6 Dell Inc. PowerEdge 2950/0CU542
> > kernel: RIP: 0010:[<ffffffff8139c5b9>]  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> > kernel: RSP: 0018:ffff880009ef9d48  EFLAGS: 00010283
> > kernel: RAX: 0000000000000100 RBX: ffff880424297240 RCX: dead0000001000cc
> > kernel: RDX: ffff8803d7bdd840 RSI: ffff880421eb3d00 RDI: dead0000001000cc
> > kernel: RBP: ffff880009ef9d48 R08: 0000000000000001 R09: 00007f516fbfad20
> > kernel: R10: 0000000000000000 R11: 0000000000000246 R12: dead000000100070
> > kernel: R13: ffff880414436480 R14: dead000000100100 R15: ffff8804242972a8
> > kernel: FS:  00007f516fbfb700(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
> > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: 00007f516fbfad30 CR3: 000000016ba69000 CR4: 00000000000006e0
> > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > kernel: Process automount (pid: 12716, threadinfo ffff880009ef8000, task ffff880071d14410)
> > kernel: Stack:
> > kernel: ffff880009ef9dd8 ffffffff811b72b3 ffff8802e407ca80 dead0000001000cc
> > kernel: ffff8803d7bdd840 ffff880009ef9f28 00000000000124f8 0000000000000000
> > kernel: ffff880421eb3d00 ffff8804144364dc ffff880414436520 ffff880424297240
> > kernel: Call Trace:
> > kernel: [<ffffffff811b72b3>] autofs4_expire_indirect+0xd3/0x440
> > kernel: [<ffffffff811b78a5>] autofs4_do_expire_multi+0xc5/0x110
> > kernel: [<ffffffff811b7c90>] ? autofs_dev_ioctl_askumount+0x30/0x30
> > kernel: [<ffffffff811b7caa>] autofs_dev_ioctl_expire+0x1a/0x20
> > kernel: [<ffffffff811b8253>] _autofs_dev_ioctl+0x273/0x360
> > kernel: [<ffffffff810ee9f6>] ? __d_free+0x46/0x70
> > kernel: [<ffffffff811b834e>] autofs_dev_ioctl+0xe/0x20
> > kernel: [<ffffffff810eb166>] do_vfs_ioctl+0x96/0x550
> > kernel: [<ffffffff810f6a7a>] ? mntput+0x1a/0x30
> > kernel: [<ffffffff810dbc4f>] ? fput+0x16f/0x210
> > kernel: [<ffffffff810eb66a>] sys_ioctl+0x4a/0x80
> > kernel: [<ffffffff813a277b>] system_call_fastpath+0x16/0x1b
> > kernel: Code: 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 5d c3 66 0f 1f 44 
> > kernel: RIP  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> > kernel: RSP <ffff880009ef9d48>
> > kernel: ---[ end trace e45ee0e39b72b82b ]---
> > 
> > ===========================
> > 
> > Note that the register dump contains numerous values like
> > 	R14: dead000000100100
> > 
> > which seems to indicate some sort of list corruption/locking problem. The
> > actual fault instruction seems to be from a call to _raw_spin_lock contained
> > in the inline expansion of the fs/autofs4/expire.c[get_next_positive_subdir]
> > call in the while loop of expire.c[autofs4_expire_indirect].
> > 
> > Is this a known problem?  Anybody else seeing these faults?
> > --
> > To unsubscribe from this list: send the line "unsubscribe autofs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html