Re: NULL pointer dereference in 3.14 cephfs kernel client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Yeah, definitely.  We haven't been aggressive about sending cephfs fixes
> to stable but will start doing so soon.
This would be very welcome!

On a related note, I saw many RCU stalls like the one below. Looking
through the commit logs I stumbled upon these maybe related fixes:
03974e8177b36d672eb59658f976f03cb77c1129   ceph: make sure request
isn't in any waiting list when kicking request.
656e4382948d4b2c81bdaf707f1400f53eff2625   ceph: protect
kick_requests() with mdsc->mutex
282c105225ec3229f344c5fced795b9e1e634440   ceph: fix kick_requests()

They also apply cleanly with an offset to 3.14 and are all included
since at least 3.18. Maybe they are also good candidates for inclusion
in stable, if i haven't missed some hidden dependency on another
patch.


------

2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087230] INFO:
rcu_sched self-detected stall on CPU { 56}  (t=10731918 jiffies
g=3574092 c=3574091 q=0)
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087250] sending NMI
to all CPUs:
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087279] NMI
backtrace for cpu 56
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087286] CPU: 56 PID:
1276 Comm: kworker/56:2 Tainted: P        W  O 3.14.26-gentoo #1
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087288] Hardware
name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087314] Workqueue:
ceph-msgr con_work [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087317] task:
ffff883ff9f0b1e0 ti: ffff883d72e42000 task.ti: ffff883d72e42000
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087319] RIP:
0010:[<ffffffff8102750a>]  [<ffffffff8102750a>]
default_send_IPI_mask_sequence_phys+0x4e/0x68
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087332] RSP:
0000:ffff884026c03dd0  EFLAGS: 00000087
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087334] RAX:
ffff884026c40000 RBX: 0000000000000039 RCX: 0000000000000039
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087336] RDX:
fe00000000000000 RSI: 0000000000000002 RDI: fe00000000000000
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087338] RBP:
ffff884026c03df8 R08: 0000000000000000 R09: ffffffff81886dc0
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087340] R10:
ffff884026c03f00 R11: 0000000000000000 R12: 0000000000000096
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087342] R13:
ffffffff81886dc0 R14: 0000000000000002 R15: 000000000000a10a
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087345] FS:
00007f545ab9f840(0000) GS:ffff884026c00000(0000)
knlGS:0000000000000000
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087347] CS:  0010
DS: 0000 ES: 0000 CR0: 000000008005003b
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087349] CR2:
00007fd749494288 CR3: 000000000180b000 CR4: 00000000000407e0
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087350] Stack:
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087352]
0000000000002710 ffffffff818389c0 0000000000000038 ffffffff818385c0
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087355]
ffff884026c0c100 ffff884026c03e08 ffffffff8102aa14 ffff884026c03e20
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087358]
ffffffff81027671 ffff884026c0c8c0 ffff884026c03e78 ffffffff8107dd8e
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087361] Call Trace:
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087363]  <IRQ>
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087366]
[<ffffffff8102aa14>] physflat_send_IPI_all+0x12/0x14
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087374]
[<ffffffff81027671>] arch_trigger_all_cpu_backtrace+0x4d/0x80
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087380]
[<ffffffff8107dd8e>] rcu_check_callbacks+0x1d1/0x4e0
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087385]
[<ffffffff81041d77>] update_process_times+0x38/0x60
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087389]
[<ffffffff8108650a>] tick_sched_handle+0x35/0x37
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087392]
[<ffffffff810869c5>] tick_sched_timer+0x35/0x53
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087397]
[<ffffffff81052d05>] __run_hrtimer.isra.25+0x72/0xcb
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087401]
[<ffffffff810533fc>] hrtimer_interrupt+0xe6/0x1c8
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087404]
[<ffffffff81026453>] local_apic_timer_interrupt+0x4f/0x52
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087407]
[<ffffffff81026695>] smp_apic_timer_interrupt+0x2b/0x3c
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087414]
[<ffffffff813ccd8a>] apic_timer_interrupt+0x6a/0x70
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087415]  <EOI>
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087417]
[<ffffffff811bdee9>] ? rb_next+0x2d/0x3d
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087435]
[<ffffffffa0327dff>] kick_requests+0x2f5/0x38d [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087448]
[<ffffffffa0328c91>] ceph_osdc_handle_map+0x2f7/0x4cc [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087459]
[<ffffffffa032541f>] dispatch+0x588/0x5d2 [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087470]
[<ffffffffa032541f>] ? dispatch+0x588/0x5d2 [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087480]
[<ffffffffa0321b1a>] con_work+0xdb5/0x2374 [libceph]
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087486]
[<ffffffff8105db76>] ? vtime_common_task_switch+0x25/0x28
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087491]
[<ffffffff8104ba9f>] process_one_work+0x154/0x221
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087494]
[<ffffffff8104c1e2>] worker_thread+0x13e/0x1d7
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087497]
[<ffffffff8104c0a4>] ? cancel_delayed_work_sync+0x10/0x10
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087500]
[<ffffffff81050cc5>] kthread+0xb2/0xba
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087503]
[<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087506]
[<ffffffff813cc13c>] ret_from_fork+0x7c/0xb0
2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087509]
[<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62


On Tue, Jan 6, 2015 at 3:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 6 Jan 2015, Markus Blank-Burian wrote:
>> Hi,
>>
>> as discussed in http://tracker.ceph.com/issues/10450 the 3.14 kernel
>> sometimes hits a NULL pointer dereference if the MDS server crashes.
>> The corresponding fix is in commit
>> 00bd8edb861eb41d274938cfc0338999d9c593a3 which only adds a list_empty
>> check. The patch applies cleanly with a -1 offset to the 3.14 tree and
>> is included in mainline kernel since 3.15.
>> Can this patch be included in one of the next stable releases?
>
> backport.
>
> Greg, do you need a patch sent to stable@ or is the sha1 above enough?
>
> Thanks!
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]