While testing the patches that add the new infrastructure to test whether gssd is running, I found a way to apparently reliably reproduce some stack corruption: [ 7535.626147] RPC: AUTH_GSS upcall timed out. [ 7535.626147] Please check user daemon is running. [ 7535.643063] BUG: unable to handle kernel paging request at 000000037fea6be0 [ 7535.644041] IP: [<ffffffff810aca07>] cpuacct_charge+0x27/0x40 [ 7535.644041] PGD 0 [ 7535.644041] Thread overran stack, or stack corrupted [ 7535.644041] Oops: 0000 [#1] SMP [ 7535.644041] Modules linked in: cts rpcsec_gss_krb5(OF) nfsv4 dns_resolver nfs fscache kvm virtio_balloon virtio_net i2c_piix4 serio_raw nfsd auth_rpcgss(OF) nfs_acl lockd sunrpc(OF) cirrus drm_kms_helper virtio_blk ttm drm i2c_core virtio_pci virtio_ring virtio ata_generic pata_acpi [ 7535.644041] CPU: 0 PID: 1419 Comm: mount.nfs Tainted: GF O 3.12.0-2.fc21.x86_64 #1 [ 7535.644041] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 7535.644041] task: ffff88007bef6300 ti: ffff88007be86000 task.ti: ffff88007be86000 [ 7535.644041] RIP: 0010:[<ffffffff810aca07>] [<ffffffff810aca07>] cpuacct_charge+0x27/0x40 [ 7535.644041] RSP: 0018:ffff88007fc03d88 EFLAGS: 00010046 [ 7535.644041] RAX: 000000000000e6a0 RBX: ffff88007bef6368 RCX: 000000007fc36200 [ 7535.644041] RDX: ffffffff81c48d20 RSI: 00000000000c1c64 RDI: ffff88007bef6300 [ 7535.644041] RBP: ffff88007fc03d88 R08: 00000000000000f0 R09: 0000000000000000 [ 7535.644041] R10: 0000000000000001 R11: ffffea0001edf800 R12: ffff8800375d0000 [ 7535.644041] R13: ffff88007bef6300 R14: 00000000000c1c64 R15: 0000000000000000 [ 7535.644041] FS: 00007fdd196208c0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 7535.644041] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 7535.644041] CR2: 000000037fea6be0 CR3: 000000007aaa7000 CR4: 00000000000006f0 [ 7535.644041] Stack: [ 7535.644041] ffff88007fc03dc8 ffffffff810a18ac 000000008389773d ffff88007bef6368 [ 7535.644041] 0000000000000000 ffff8800375d0000 ffff88007fc14540 0000000000000000 [ 7535.644041] ffff88007fc03e28 ffffffff810a3259 ffff88007fc03e00 ffffffff8109dc08 [ 7535.644041] Call Trace: [ 7535.644041] <IRQ> [ 7535.644041] [<ffffffff810a18ac>] update_curr+0xcc/0x160 [ 7535.644041] [<ffffffff810a3259>] task_tick_fair+0x2b9/0x680 [ 7535.644041] [<ffffffff8109dc08>] ? sched_clock_cpu+0xa8/0x100 [ 7535.644041] [<ffffffff81099b81>] scheduler_tick+0x61/0xe0 [ 7535.644041] [<ffffffff81076dc6>] update_process_times+0x66/0x80 [ 7535.644041] [<ffffffff810cab95>] tick_sched_handle.isra.15+0x25/0x60 [ 7535.644041] [<ffffffff810cac11>] tick_sched_timer+0x41/0x60 [ 7535.644041] [<ffffffff8108e6e4>] __run_hrtimer+0x74/0x1d0 [ 7535.644041] [<ffffffff810cabd0>] ? tick_sched_handle.isra.15+0x60/0x60 [ 7535.644041] [<ffffffff8108eef7>] hrtimer_interrupt+0xf7/0x240 [ 7535.644041] [<ffffffff81041ab7>] local_apic_timer_interrupt+0x37/0x60 [ 7535.644041] [<ffffffff8167323f>] smp_apic_timer_interrupt+0x3f/0x60 [ 7535.644041] [<ffffffff81671bdd>] apic_timer_interrupt+0x6d/0x80 [ 7535.644041] <EOI> [ 7535.644041] Code: 5d eb d7 90 66 66 66 66 90 48 8b 47 08 55 48 89 e5 48 63 48 18 48 8b 87 b8 06 00 00 48 8b 50 48 0f 1f 40 00 48 8b 82 88 00 00 00 <48> 03 04 cd e0 5b cf 81 48 01 30 48 8b 52 40 48 85 d2 75 e5 5d [ 7535.644041] RIP [<ffffffff810aca07>] cpuacct_charge+0x27/0x40 [ 7535.644041] RSP <ffff88007fc03d88> [ 7535.644041] CR2: 000000037fea6be0 This happens with or without the patchset I proposed earlier. I've also seen it double fault, and spontaneously reboot. Here's how I'm able to reproduce it: I have this in /etc/fstab: server:/scratch /mnt/nfs nfs sec=krb5,noauto 0 0 ...start with rpc.gssd running. # mount /mnt/nfs # umount /mnt/nfs # service rpcgssd stop # mount /mnt/nfs ...at this point, the mount command will hang as expected due to gssd being down, but it then continues hanging even after printing this message: RPC: AUTH_GSS upcall timed out. ...a little while later, I either get the stack trace above, or one reporting a double fault, or a spontaneous reboot. Perhaps we've got something on the stack (maybe a timer?) and then aren't cancelling it before returning from the function that owns it? -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html