Re: automount/kernel crashes

Ian Kent <raven@xxxxxxxxxx> · Mon, 16 Apr 2012 10:24:54 +0800

On Sun, 2012-04-15 at 14:05 -0700, Jan Sanislo wrote:
> We are seeing occasional (approx. weekly) automount/kernel crashes using
> kernel version 3.1.7 and autofs version 5.0.5-39.  The log files show
> the following traceback:

Nick,

Can you have a look at my fs/autofs4/expire.c:get_next_positive_subdir()
function please.

It looks like my assignment of "p = q" in the "if (!simple_positive(q))
{}" block is incorrect. My thinking is that if q goes goes away while
waiting on the d_lock then it will have been removed from the child list
so I should just "goto again" with p as is. q itself will not actually
be freed until function exit since the autofs sbi->lookup_lock will
block in ->d_release(). Can you see any other problem with it and is
there a similar problem with
fs/autofs4/expire.c:get_next_positive_dentry()?

> 
> ===========================
> 
> kernel: general protection fault: 0000 [#1] SMP 
> kernel: CPU 1 
> kernel: Modules linked in: binfmt_misc xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables nfsd dm_snapshot dm_mirror dm_region_hash dm_log sg bnx2 rng_core ipv6 ext4 jbd2 crc16 usbhid sd_mod sr_mod cdrom ata_piix libata megaraid_sas ehci_hcd uhci_hcd scsi_mod button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect dm_mod [last unloaded: scsi_wait_scan]
> kernel: 
> kernel: Pid: 12716, comm: automount Not tainted 3.1.7-0cse.1 #6 Dell Inc. PowerEdge 2950/0CU542
> kernel: RIP: 0010:[<ffffffff8139c5b9>]  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> kernel: RSP: 0018:ffff880009ef9d48  EFLAGS: 00010283
> kernel: RAX: 0000000000000100 RBX: ffff880424297240 RCX: dead0000001000cc
> kernel: RDX: ffff8803d7bdd840 RSI: ffff880421eb3d00 RDI: dead0000001000cc
> kernel: RBP: ffff880009ef9d48 R08: 0000000000000001 R09: 00007f516fbfad20
> kernel: R10: 0000000000000000 R11: 0000000000000246 R12: dead000000100070
> kernel: R13: ffff880414436480 R14: dead000000100100 R15: ffff8804242972a8
> kernel: FS:  00007f516fbfb700(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 00007f516fbfad30 CR3: 000000016ba69000 CR4: 00000000000006e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> kernel: Process automount (pid: 12716, threadinfo ffff880009ef8000, task ffff880071d14410)
> kernel: Stack:
> kernel: ffff880009ef9dd8 ffffffff811b72b3 ffff8802e407ca80 dead0000001000cc
> kernel: ffff8803d7bdd840 ffff880009ef9f28 00000000000124f8 0000000000000000
> kernel: ffff880421eb3d00 ffff8804144364dc ffff880414436520 ffff880424297240
> kernel: Call Trace:
> kernel: [<ffffffff811b72b3>] autofs4_expire_indirect+0xd3/0x440
> kernel: [<ffffffff811b78a5>] autofs4_do_expire_multi+0xc5/0x110
> kernel: [<ffffffff811b7c90>] ? autofs_dev_ioctl_askumount+0x30/0x30
> kernel: [<ffffffff811b7caa>] autofs_dev_ioctl_expire+0x1a/0x20
> kernel: [<ffffffff811b8253>] _autofs_dev_ioctl+0x273/0x360
> kernel: [<ffffffff810ee9f6>] ? __d_free+0x46/0x70
> kernel: [<ffffffff811b834e>] autofs_dev_ioctl+0xe/0x20
> kernel: [<ffffffff810eb166>] do_vfs_ioctl+0x96/0x550
> kernel: [<ffffffff810f6a7a>] ? mntput+0x1a/0x30
> kernel: [<ffffffff810dbc4f>] ? fput+0x16f/0x210
> kernel: [<ffffffff810eb66a>] sys_ioctl+0x4a/0x80
> kernel: [<ffffffff813a277b>] system_call_fastpath+0x16/0x1b
> kernel: Code: 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 5d c3 66 0f 1f 44 
> kernel: RIP  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> kernel: RSP <ffff880009ef9d48>
> kernel: ---[ end trace e45ee0e39b72b82b ]---
> 
> ===========================
> 
> Note that the register dump contains numerous values like
> 	R14: dead000000100100
> 
> which seems to indicate some sort of list corruption/locking problem. The
> actual fault instruction seems to be from a call to _raw_spin_lock contained
> in the inline expansion of the fs/autofs4/expire.c[get_next_positive_subdir]
> call in the while loop of expire.c[autofs4_expire_indirect].
> 
> Is this a known problem?  Anybody else seeing these faults?
> --
> To unsubscribe from this list: send the line "unsubscribe autofs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html