On Sun, 2012-04-15 at 14:05 -0700, Jan Sanislo wrote: > We are seeing occasional (approx. weekly) automount/kernel crashes using > kernel version 3.1.7 and autofs version 5.0.5-39. The log files show > the following traceback: Nick, Can you have a look at my fs/autofs4/expire.c:get_next_positive_subdir() function please. It looks like my assignment of "p = q" in the "if (!simple_positive(q)) {}" block is incorrect. My thinking is that if q goes goes away while waiting on the d_lock then it will have been removed from the child list so I should just "goto again" with p as is. q itself will not actually be freed until function exit since the autofs sbi->lookup_lock will block in ->d_release(). Can you see any other problem with it and is there a similar problem with fs/autofs4/expire.c:get_next_positive_dentry()? > > =========================== > > kernel: general protection fault: 0000 [#1] SMP > kernel: CPU 1 > kernel: Modules linked in: binfmt_misc xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables nfsd dm_snapshot dm_mirror dm_region_hash dm_log sg bnx2 rng_core ipv6 ext4 jbd2 crc16 usbhid sd_mod sr_mod cdrom ata_piix libata megaraid_sas ehci_hcd uhci_hcd scsi_mod button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect dm_mod [last unloaded: scsi_wait_scan] > kernel: > kernel: Pid: 12716, comm: automount Not tainted 3.1.7-0cse.1 #6 Dell Inc. PowerEdge 2950/0CU542 > kernel: RIP: 0010:[<ffffffff8139c5b9>] [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20 > kernel: RSP: 0018:ffff880009ef9d48 EFLAGS: 00010283 > kernel: RAX: 0000000000000100 RBX: ffff880424297240 RCX: dead0000001000cc > kernel: RDX: ffff8803d7bdd840 RSI: ffff880421eb3d00 RDI: dead0000001000cc > kernel: RBP: ffff880009ef9d48 R08: 0000000000000001 R09: 00007f516fbfad20 > kernel: R10: 0000000000000000 R11: 0000000000000246 R12: dead000000100070 > kernel: R13: ffff880414436480 R14: dead000000100100 R15: ffff8804242972a8 > kernel: FS: 00007f516fbfb700(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000 > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > kernel: CR2: 00007f516fbfad30 CR3: 000000016ba69000 CR4: 00000000000006e0 > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > kernel: Process automount (pid: 12716, threadinfo ffff880009ef8000, task ffff880071d14410) > kernel: Stack: > kernel: ffff880009ef9dd8 ffffffff811b72b3 ffff8802e407ca80 dead0000001000cc > kernel: ffff8803d7bdd840 ffff880009ef9f28 00000000000124f8 0000000000000000 > kernel: ffff880421eb3d00 ffff8804144364dc ffff880414436520 ffff880424297240 > kernel: Call Trace: > kernel: [<ffffffff811b72b3>] autofs4_expire_indirect+0xd3/0x440 > kernel: [<ffffffff811b78a5>] autofs4_do_expire_multi+0xc5/0x110 > kernel: [<ffffffff811b7c90>] ? autofs_dev_ioctl_askumount+0x30/0x30 > kernel: [<ffffffff811b7caa>] autofs_dev_ioctl_expire+0x1a/0x20 > kernel: [<ffffffff811b8253>] _autofs_dev_ioctl+0x273/0x360 > kernel: [<ffffffff810ee9f6>] ? __d_free+0x46/0x70 > kernel: [<ffffffff811b834e>] autofs_dev_ioctl+0xe/0x20 > kernel: [<ffffffff810eb166>] do_vfs_ioctl+0x96/0x550 > kernel: [<ffffffff810f6a7a>] ? mntput+0x1a/0x30 > kernel: [<ffffffff810dbc4f>] ? fput+0x16f/0x210 > kernel: [<ffffffff810eb66a>] sys_ioctl+0x4a/0x80 > kernel: [<ffffffff813a277b>] system_call_fastpath+0x16/0x1b > kernel: Code: 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 5d c3 66 0f 1f 44 > kernel: RIP [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20 > kernel: RSP <ffff880009ef9d48> > kernel: ---[ end trace e45ee0e39b72b82b ]--- > > =========================== > > Note that the register dump contains numerous values like > R14: dead000000100100 > > which seems to indicate some sort of list corruption/locking problem. The > actual fault instruction seems to be from a call to _raw_spin_lock contained > in the inline expansion of the fs/autofs4/expire.c[get_next_positive_subdir] > call in the while loop of expire.c[autofs4_expire_indirect]. > > Is this a known problem? Anybody else seeing these faults? > -- > To unsubscribe from this list: send the line "unsubscribe autofs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html