On Mon, Apr 16, 2012 at 10:24:54AM +0800, Ian Kent wrote: > On Sun, 2012-04-15 at 14:05 -0700, Jan Sanislo wrote: > > We are seeing occasional (approx. weekly) automount/kernel crashes using > > kernel version 3.1.7 and autofs version 5.0.5-39. The log files show > > the following traceback: > > Nick, > > Can you have a look at my fs/autofs4/expire.c:get_next_positive_subdir() > function please. > > It looks like my assignment of "p = q" in the "if (!simple_positive(q)) > {}" block is incorrect. My thinking is that if q goes goes away while > waiting on the d_lock then it will have been removed from the child list > so I should just "goto again" with p as is. q itself will not actually > be freed until function exit since the autofs sbi->lookup_lock will > block in ->d_release(). Can you see any other problem with it and is > there a similar problem with > fs/autofs4/expire.c:get_next_positive_dentry()? Hi Ian, Firstly, what's the lock ordering on your d_lock of the dentries? Do you ensure that the vfs never locks two dentries at once, and you have your own lock order? Secondly, it seems like d_release won't be called until after the dentry has been removed from the d_child list. Couldn't that cause a corruption here? Thanks, Nick > > > > > =========================== > > > > kernel: general protection fault: 0000 [#1] SMP > > kernel: CPU 1 > > kernel: Modules linked in: binfmt_misc xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables nfsd dm_snapshot dm_mirror dm_region_hash dm_log sg bnx2 rng_core ipv6 ext4 jbd2 crc16 usbhid sd_mod sr_mod cdrom ata_piix libata megaraid_sas ehci_hcd uhci_hcd scsi_mod button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect dm_mod [last unloaded: scsi_wait_scan] > > kernel: > > kernel: Pid: 12716, comm: automount Not tainted 3.1.7-0cse.1 #6 Dell Inc. PowerEdge 2950/0CU542 > > kernel: RIP: 0010:[<ffffffff8139c5b9>] [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20 > > kernel: RSP: 0018:ffff880009ef9d48 EFLAGS: 00010283 > > kernel: RAX: 0000000000000100 RBX: ffff880424297240 RCX: dead0000001000cc > > kernel: RDX: ffff8803d7bdd840 RSI: ffff880421eb3d00 RDI: dead0000001000cc > > kernel: RBP: ffff880009ef9d48 R08: 0000000000000001 R09: 00007f516fbfad20 > > kernel: R10: 0000000000000000 R11: 0000000000000246 R12: dead000000100070 > > kernel: R13: ffff880414436480 R14: dead000000100100 R15: ffff8804242972a8 > > kernel: FS: 00007f516fbfb700(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000 > > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > kernel: CR2: 00007f516fbfad30 CR3: 000000016ba69000 CR4: 00000000000006e0 > > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > kernel: Process automount (pid: 12716, threadinfo ffff880009ef8000, task ffff880071d14410) > > kernel: Stack: > > kernel: ffff880009ef9dd8 ffffffff811b72b3 ffff8802e407ca80 dead0000001000cc > > kernel: ffff8803d7bdd840 ffff880009ef9f28 00000000000124f8 0000000000000000 > > kernel: ffff880421eb3d00 ffff8804144364dc ffff880414436520 ffff880424297240 > > kernel: Call Trace: > > kernel: [<ffffffff811b72b3>] autofs4_expire_indirect+0xd3/0x440 > > kernel: [<ffffffff811b78a5>] autofs4_do_expire_multi+0xc5/0x110 > > kernel: [<ffffffff811b7c90>] ? autofs_dev_ioctl_askumount+0x30/0x30 > > kernel: [<ffffffff811b7caa>] autofs_dev_ioctl_expire+0x1a/0x20 > > kernel: [<ffffffff811b8253>] _autofs_dev_ioctl+0x273/0x360 > > kernel: [<ffffffff810ee9f6>] ? __d_free+0x46/0x70 > > kernel: [<ffffffff811b834e>] autofs_dev_ioctl+0xe/0x20 > > kernel: [<ffffffff810eb166>] do_vfs_ioctl+0x96/0x550 > > kernel: [<ffffffff810f6a7a>] ? mntput+0x1a/0x30 > > kernel: [<ffffffff810dbc4f>] ? fput+0x16f/0x210 > > kernel: [<ffffffff810eb66a>] sys_ioctl+0x4a/0x80 > > kernel: [<ffffffff813a277b>] system_call_fastpath+0x16/0x1b > > kernel: Code: 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 5d c3 66 0f 1f 44 > > kernel: RIP [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20 > > kernel: RSP <ffff880009ef9d48> > > kernel: ---[ end trace e45ee0e39b72b82b ]--- > > > > =========================== > > > > Note that the register dump contains numerous values like > > R14: dead000000100100 > > > > which seems to indicate some sort of list corruption/locking problem. The > > actual fault instruction seems to be from a call to _raw_spin_lock contained > > in the inline expansion of the fs/autofs4/expire.c[get_next_positive_subdir] > > call in the while loop of expire.c[autofs4_expire_indirect]. > > > > Is this a known problem? Anybody else seeing these faults? > > -- > > To unsubscribe from this list: send the line "unsubscribe autofs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html