Re: NULL pointer dereference in 3.14 cephfs kernel client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 6 Jan 2015, Markus Blank-Burian wrote:
> > Yeah, definitely.  We haven't been aggressive about sending cephfs fixes
> > to stable but will start doing so soon.
> This would be very welcome!
> 
> On a related note, I saw many RCU stalls like the one below. Looking
> through the commit logs I stumbled upon these maybe related fixes:
> 03974e8177b36d672eb59658f976f03cb77c1129   ceph: make sure request
> isn't in any waiting list when kicking request.
> 656e4382948d4b2c81bdaf707f1400f53eff2625   ceph: protect
> kick_requests() with mdsc->mutex
> 282c105225ec3229f344c5fced795b9e1e634440   ceph: fix kick_requests()
> 
> They also apply cleanly with an offset to 3.14 and are all included
> since at least 3.18. Maybe they are also good candidates for inclusion
> in stable, if i haven't missed some hidden dependency on another
> patch.

I'm copying Ilya as he's been tracking these.

Thanks-
sage


> 
> 
> ------
> 
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087230] INFO:
> rcu_sched self-detected stall on CPU { 56}  (t=10731918 jiffies
> g=3574092 c=3574091 q=0)
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087250] sending NMI
> to all CPUs:
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087279] NMI
> backtrace for cpu 56
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087286] CPU: 56 PID:
> 1276 Comm: kworker/56:2 Tainted: P        W  O 3.14.26-gentoo #1
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087288] Hardware
> name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087314] Workqueue:
> ceph-msgr con_work [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087317] task:
> ffff883ff9f0b1e0 ti: ffff883d72e42000 task.ti: ffff883d72e42000
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087319] RIP:
> 0010:[<ffffffff8102750a>]  [<ffffffff8102750a>]
> default_send_IPI_mask_sequence_phys+0x4e/0x68
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087332] RSP:
> 0000:ffff884026c03dd0  EFLAGS: 00000087
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087334] RAX:
> ffff884026c40000 RBX: 0000000000000039 RCX: 0000000000000039
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087336] RDX:
> fe00000000000000 RSI: 0000000000000002 RDI: fe00000000000000
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087338] RBP:
> ffff884026c03df8 R08: 0000000000000000 R09: ffffffff81886dc0
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087340] R10:
> ffff884026c03f00 R11: 0000000000000000 R12: 0000000000000096
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087342] R13:
> ffffffff81886dc0 R14: 0000000000000002 R15: 000000000000a10a
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087345] FS:
> 00007f545ab9f840(0000) GS:ffff884026c00000(0000)
> knlGS:0000000000000000
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087347] CS:  0010
> DS: 0000 ES: 0000 CR0: 000000008005003b
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087349] CR2:
> 00007fd749494288 CR3: 000000000180b000 CR4: 00000000000407e0
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087350] Stack:
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087352]
> 0000000000002710 ffffffff818389c0 0000000000000038 ffffffff818385c0
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087355]
> ffff884026c0c100 ffff884026c03e08 ffffffff8102aa14 ffff884026c03e20
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087358]
> ffffffff81027671 ffff884026c0c8c0 ffff884026c03e78 ffffffff8107dd8e
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087361] Call Trace:
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087363]  <IRQ>
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087366]
> [<ffffffff8102aa14>] physflat_send_IPI_all+0x12/0x14
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087374]
> [<ffffffff81027671>] arch_trigger_all_cpu_backtrace+0x4d/0x80
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087380]
> [<ffffffff8107dd8e>] rcu_check_callbacks+0x1d1/0x4e0
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087385]
> [<ffffffff81041d77>] update_process_times+0x38/0x60
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087389]
> [<ffffffff8108650a>] tick_sched_handle+0x35/0x37
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087392]
> [<ffffffff810869c5>] tick_sched_timer+0x35/0x53
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087397]
> [<ffffffff81052d05>] __run_hrtimer.isra.25+0x72/0xcb
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087401]
> [<ffffffff810533fc>] hrtimer_interrupt+0xe6/0x1c8
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087404]
> [<ffffffff81026453>] local_apic_timer_interrupt+0x4f/0x52
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087407]
> [<ffffffff81026695>] smp_apic_timer_interrupt+0x2b/0x3c
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087414]
> [<ffffffff813ccd8a>] apic_timer_interrupt+0x6a/0x70
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087415]  <EOI>
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087417]
> [<ffffffff811bdee9>] ? rb_next+0x2d/0x3d
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087435]
> [<ffffffffa0327dff>] kick_requests+0x2f5/0x38d [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087448]
> [<ffffffffa0328c91>] ceph_osdc_handle_map+0x2f7/0x4cc [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087459]
> [<ffffffffa032541f>] dispatch+0x588/0x5d2 [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087470]
> [<ffffffffa032541f>] ? dispatch+0x588/0x5d2 [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087480]
> [<ffffffffa0321b1a>] con_work+0xdb5/0x2374 [libceph]
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087486]
> [<ffffffff8105db76>] ? vtime_common_task_switch+0x25/0x28
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087491]
> [<ffffffff8104ba9f>] process_one_work+0x154/0x221
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087494]
> [<ffffffff8104c1e2>] worker_thread+0x13e/0x1d7
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087497]
> [<ffffffff8104c0a4>] ? cancel_delayed_work_sync+0x10/0x10
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087500]
> [<ffffffff81050cc5>] kthread+0xb2/0xba
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087503]
> [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087506]
> [<ffffffff813cc13c>] ret_from_fork+0x7c/0xb0
> 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087509]
> [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62
> 
> 
> On Tue, Jan 6, 2015 at 3:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Tue, 6 Jan 2015, Markus Blank-Burian wrote:
> >> Hi,
> >>
> >> as discussed in http://tracker.ceph.com/issues/10450 the 3.14 kernel
> >> sometimes hits a NULL pointer dereference if the MDS server crashes.
> >> The corresponding fix is in commit
> >> 00bd8edb861eb41d274938cfc0338999d9c593a3 which only adds a list_empty
> >> check. The patch applies cleanly with a -1 offset to the 3.14 tree and
> >> is included in mainline kernel since 3.15.
> >> Can this patch be included in one of the next stable releases?
> >
> > backport.
> >
> > Greg, do you need a patch sent to stable@ or is the sha1 above enough?
> >
> > Thanks!
> > sage
> >
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]