On Tue, 6 Jan 2015, Markus Blank-Burian wrote: > > Yeah, definitely. We haven't been aggressive about sending cephfs fixes > > to stable but will start doing so soon. > This would be very welcome! > > On a related note, I saw many RCU stalls like the one below. Looking > through the commit logs I stumbled upon these maybe related fixes: > 03974e8177b36d672eb59658f976f03cb77c1129 ceph: make sure request > isn't in any waiting list when kicking request. > 656e4382948d4b2c81bdaf707f1400f53eff2625 ceph: protect > kick_requests() with mdsc->mutex > 282c105225ec3229f344c5fced795b9e1e634440 ceph: fix kick_requests() > > They also apply cleanly with an offset to 3.14 and are all included > since at least 3.18. Maybe they are also good candidates for inclusion > in stable, if i haven't missed some hidden dependency on another > patch. I'm copying Ilya as he's been tracking these. Thanks- sage > > > ------ > > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087230] INFO: > rcu_sched self-detected stall on CPU { 56} (t=10731918 jiffies > g=3574092 c=3574091 q=0) > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087250] sending NMI > to all CPUs: > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087279] NMI > backtrace for cpu 56 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087286] CPU: 56 PID: > 1276 Comm: kworker/56:2 Tainted: P W O 3.14.26-gentoo #1 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087288] Hardware > name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087314] Workqueue: > ceph-msgr con_work [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087317] task: > ffff883ff9f0b1e0 ti: ffff883d72e42000 task.ti: ffff883d72e42000 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087319] RIP: > 0010:[<ffffffff8102750a>] [<ffffffff8102750a>] > default_send_IPI_mask_sequence_phys+0x4e/0x68 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087332] RSP: > 0000:ffff884026c03dd0 EFLAGS: 00000087 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087334] RAX: > ffff884026c40000 RBX: 0000000000000039 RCX: 0000000000000039 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087336] RDX: > fe00000000000000 RSI: 0000000000000002 RDI: fe00000000000000 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087338] RBP: > ffff884026c03df8 R08: 0000000000000000 R09: ffffffff81886dc0 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087340] R10: > ffff884026c03f00 R11: 0000000000000000 R12: 0000000000000096 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087342] R13: > ffffffff81886dc0 R14: 0000000000000002 R15: 000000000000a10a > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087345] FS: > 00007f545ab9f840(0000) GS:ffff884026c00000(0000) > knlGS:0000000000000000 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087347] CS: 0010 > DS: 0000 ES: 0000 CR0: 000000008005003b > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087349] CR2: > 00007fd749494288 CR3: 000000000180b000 CR4: 00000000000407e0 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087350] Stack: > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087352] > 0000000000002710 ffffffff818389c0 0000000000000038 ffffffff818385c0 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087355] > ffff884026c0c100 ffff884026c03e08 ffffffff8102aa14 ffff884026c03e20 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087358] > ffffffff81027671 ffff884026c0c8c0 ffff884026c03e78 ffffffff8107dd8e > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087361] Call Trace: > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087363] <IRQ> > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087366] > [<ffffffff8102aa14>] physflat_send_IPI_all+0x12/0x14 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087374] > [<ffffffff81027671>] arch_trigger_all_cpu_backtrace+0x4d/0x80 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087380] > [<ffffffff8107dd8e>] rcu_check_callbacks+0x1d1/0x4e0 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087385] > [<ffffffff81041d77>] update_process_times+0x38/0x60 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087389] > [<ffffffff8108650a>] tick_sched_handle+0x35/0x37 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087392] > [<ffffffff810869c5>] tick_sched_timer+0x35/0x53 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087397] > [<ffffffff81052d05>] __run_hrtimer.isra.25+0x72/0xcb > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087401] > [<ffffffff810533fc>] hrtimer_interrupt+0xe6/0x1c8 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087404] > [<ffffffff81026453>] local_apic_timer_interrupt+0x4f/0x52 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087407] > [<ffffffff81026695>] smp_apic_timer_interrupt+0x2b/0x3c > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087414] > [<ffffffff813ccd8a>] apic_timer_interrupt+0x6a/0x70 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087415] <EOI> > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087417] > [<ffffffff811bdee9>] ? rb_next+0x2d/0x3d > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087435] > [<ffffffffa0327dff>] kick_requests+0x2f5/0x38d [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087448] > [<ffffffffa0328c91>] ceph_osdc_handle_map+0x2f7/0x4cc [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087459] > [<ffffffffa032541f>] dispatch+0x588/0x5d2 [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087470] > [<ffffffffa032541f>] ? dispatch+0x588/0x5d2 [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087480] > [<ffffffffa0321b1a>] con_work+0xdb5/0x2374 [libceph] > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087486] > [<ffffffff8105db76>] ? vtime_common_task_switch+0x25/0x28 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087491] > [<ffffffff8104ba9f>] process_one_work+0x154/0x221 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087494] > [<ffffffff8104c1e2>] worker_thread+0x13e/0x1d7 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087497] > [<ffffffff8104c0a4>] ? cancel_delayed_work_sync+0x10/0x10 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087500] > [<ffffffff81050cc5>] kthread+0xb2/0xba > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087503] > [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087506] > [<ffffffff813cc13c>] ret_from_fork+0x7c/0xb0 > 2014-12-23T00:44:39+01:00 kaa-103 kernel: [252114.087509] > [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62 > > > On Tue, Jan 6, 2015 at 3:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Tue, 6 Jan 2015, Markus Blank-Burian wrote: > >> Hi, > >> > >> as discussed in http://tracker.ceph.com/issues/10450 the 3.14 kernel > >> sometimes hits a NULL pointer dereference if the MDS server crashes. > >> The corresponding fix is in commit > >> 00bd8edb861eb41d274938cfc0338999d9c593a3 which only adds a list_empty > >> check. The patch applies cleanly with a -1 offset to the 3.14 tree and > >> is included in mainline kernel since 3.15. > >> Can this patch be included in one of the next stable releases? > > > > backport. > > > > Greg, do you need a patch sent to stable@ or is the sha1 above enough? > > > > Thanks! > > sage > > > > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html