Yes you need those lkml patches. I added them to our custom 4.4 Kernel too to prevent this. Stefan
Excuse my typo sent from my mobile phone. After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few ofthese issues where an OSD would fail with the stack below. I logged abug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there isa similar description at https://lkml.org/lkml/2016/6/22/102, but theodd part is we have turned off CFQ and blk-mq/scsi-mq and are usingjust the noop scheduler.Does the ceph kernel code somehow use the fair scheduler code block?Thanks--Alex GorbachevStorciumJun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.203/04/2015Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:ffff880f79df8000 ti: ffff880f79fb8000 task.ti: ffff880f79fb8000Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:0010:[<ffffffff810b416e>] [<ffffffff810b416e>]task_numa_find_cpu+0x22e/0x6f0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:0018:ffff880f79fbb818 EFLAGS: 00010206Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:0000000000000000 RBX: ffff880f79fbb8b8 RCX: 0000000000000000Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:0000000000000000 RSI: 0000000000000000 RDI: ffff8810352d4800Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:ffff880f79fbb880 R08: 00000001020cf87c R09: 0000000000ff00ffJun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:0000000000000009 R11: 0000000000000006 R12: ffff8807c3adc4c0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:0000000000000006 R14: 000000000000033e R15: fffffffffffffec7Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:00007f30e46b8700(0000) GS:ffff88105f580000(0000)knlGS:0000000000000000Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS: 0010 DS:0000 ES: 0000 CR0: 0000000080050033Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:000000001321a000 CR3: 0000000853598000 CR4: 00000000000406e0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]ffffffff813d050f 000000000000000d 0000000000000045 ffff880f79df8000Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]000000000000033f 0000000000000000 0000000000016b00 000000000000033fJun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]ffff880f79df8000 ffff880f79fbb8b8 00000000000001f4 0000000000000054Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685555][<ffffffff813d050f>] ? cpumask_next_and+0x2f/0x40Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584][<ffffffff810b4a6e>] task_numa_migrate+0x43e/0x9b0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613][<ffffffff810b3acc>] ? update_cfs_shares+0xbc/0x100Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642][<ffffffff810b5059>] numa_migrate_preferred+0x79/0x80Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672][<ffffffff810b9b94>] task_numa_fault+0x7f4/0xd40Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700][<ffffffff813d9634>] ? timerqueue_del+0x24/0x70Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729][<ffffffff810b9205>] ? should_numa_migrate_memory+0x55/0x130Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762][<ffffffff811bd590>] handle_mm_fault+0xbc0/0x1820Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793][<ffffffff810edc00>] ? __hrtimer_init+0x90/0x90Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822][<ffffffff810c211d>] ? remove_wait_queue+0x4d/0x60Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853][<ffffffff8121e20a>] ? poll_freewait+0x4a/0xa0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882][<ffffffff8106a537>] __do_page_fault+0x197/0x400Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910][<ffffffff8106a7c2>] do_page_fault+0x22/0x30Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939][<ffffffff8180a878>] page_fault+0x28/0x30Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967][<ffffffff813e4c5f>] ? copy_page_to_iter_iovec+0x5f/0x300Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997][<ffffffff810b2795>] ? select_task_rq_fair+0x625/0x700Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026][<ffffffff813e4f16>] copy_page_to_iter+0x16/0xa0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056][<ffffffff816f02ad>] skb_copy_datagram_iter+0x14d/0x280Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087][<ffffffff8174a503>] tcp_recvmsg+0x613/0xbe0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117][<ffffffff8177844e>] inet_recvmsg+0x7e/0xb0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146][<ffffffff816e0d3b>] sock_recvmsg+0x3b/0x50Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173][<ffffffff816e0f91>] SYSC_recvfrom+0xe1/0x160Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202][<ffffffff810f36f5>] ? ktime_get_ts64+0x45/0xf0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230][<ffffffff816e239e>] SyS_recvfrom+0xe/0x10Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686259][<ffffffff818086f2>] entry_SYSCALL_64_fastpath+0x16/0x71Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686287] Code: 55 b0 4c89 f7 e8 53 cd ff ff 48 8b 55 b0 49 8b 4e 78 48 8b 82 d8 01 00 00 4883 c1 01 31 d2 49 0f af 86 b0 00 00 00 4c 8b 73 78 <48> f7 f1 48 8b 4b20 49 89 c0 48 29 c1 48 8b 45 d0 4c 03 43 48Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686512] RIP[<ffffffff810b416e>] task_numa_find_cpu+0x22e/0x6f0Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686544] RSP <ffff880f79fbb818>Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686896] ---[ end trace544cb9f68cb55c93 ]---Jun 28 09:52:15 roc04r-sca090 kernel: [138246.669713] mpt2sas_cm0:log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)Jun 28 09:55:01 roc0_______________________________________________ceph-users mailing listceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com