Hi Sebastian, On Monday, June 06, 2016 12:02 AM, Sebastian Andrzej Siewior wrote: > On 06/03/2016 07:15 PM, David Hauck wrote: > Hi David, > >> On Fri, 3 Jun 2016 at 09:38:00, Sebastian Andrzej Siewior wrote: >>> I am not aware of any lockup on v3.18-RT tree. I just tried a few boot >>> up on two of machines and it looks good. Don't have currently any >>> control on anything >4 cores. >> >> Thx. We've done further testing and see that v3.18.9 does not suffer >> the same problem. >> >> I also have some dump information (all "unable to handle kernel >> paging > request") and was wondering what the best way to pass this along to > the list might be? Would a compressed archive of the (4) log files be > OK to send along? > > That "unable to handle kernel paging request" shouldn't be much. > Please send it to the list. The first BUG backtrace is the important one. Thx, here's one - hope this might be helpful: [ 1.352165] BUG: unable to handle kernel [ 1.352167] paging request [ 1.352169] at a93c2560 [ 1.352172] IP: [ 1.352178] [<c107c248>] can_migrate_task+0x58/0x220 [ 1.352182] *pde = 00000000 [ 1.352183] [ 1.352187] Oops: 0000 [#1] [ 1.352189] PREEMPT [ 1.352190] SMP [ 1.352191] [ 1.352194] Modules linked in: [ 1.352194] [ 1.352198] CPU: 5 PID: 238 Comm: kthreadd Not tainted 3.18.29-rt30 #2 [ 1.352201] Hardware name: Default string Default string/HEP8225, BIOS HEPHF107 05/20/2016 [ 1.352205] task: db7d1d40 ti: db27e000 task.ti: db27e000 [ 1.352208] EIP: 0060:[<c107c248>] EFLAGS: 00010086 CPU: 5 [ 1.352212] EIP is at can_migrate_task+0x58/0x220 [ 1.352215] EAX: 00000005 EBX: db27fe10 ECX: 1830b404 EDX: a93c2560 [ 1.352217] ESI: dc508000 EDI: 00000002 EBP: db27fdbc ESP: db27fdb0 [ 1.352220] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 1.352223] CR0: 80050033 CR2: a93c2560 CR3: 01a3a000 CR4: 001407d0 [ 1.352225] Stack: [ 1.352228] dc50805c [ 1.352230] 00000367 [ 1.352232] dcb53b88 [ 1.352234] db27fe60 [ 1.352236] c1083f81 [ 1.352238] 00000000 [ 1.352239] 00000000 [ 1.352241] 00000005 [ 1.352242] [ 1.352245] dcae5b30 [ 1.352246] dbe5d608 [ 1.352248] 00000000 [ 1.352250] dbe5d600 [ 1.352252] c1a30660 [ 1.352253] c1a30660 [ 1.352255] 00000000 [ 1.352257] 00000082 [ 1.352258] [ 1.352261] dcb53660 [ 1.352263] db27fe60 [ 1.352265] dbe3a1c0 [ 1.352266] dcb53b88 [ 1.352268] dcb53660 [ 1.352270] dc508000 [ 1.352272] 000003ef [ 1.352274] 0000019b [ 1.352275] [ 1.352277] Call Trace: [ 1.352284] [<c1083f81>] load_balance+0x321/0x8b0 [ 1.352293] [<c1084ad8>] pick_next_task_fair+0x5c8/0xb10 [ 1.352300] [<c1072bd1>] ? dequeue_task+0x91/0xc0 [ 1.352307] [<c16bce2a>] __schedule+0xfa/0xae0 [ 1.352313] [<c16c06c7>] ? _raw_spin_unlock_irqrestore+0x17/0x50 [ 1.352320] [<c107699b>] ? try_to_wake_up+0x5b/0x550 [ 1.352324] [<c1076f5f>] ? wake_up_state+0xf/0x20 [ 1.352330] [<c108b0cc>] ? __swait_wake_locked+0x3c/0x80 [ 1.352336] [<c1065930>] ? process_one_work+0x410/0x410 [ 1.352342] [<c16bd83b>] schedule+0x2b/0x90 [ 1.352347] [<c1069e73>] kthread+0x73/0xb0 [ 1.352354] [<c1060000>] ? SyS_olduname+0x100/0x180 [ 1.352360] [<c16c1081>] ret_from_kernel_thread+0x21/0x30 [ 1.352365] [<c1069e00>] ? kthread_worker_fn+0x160/0x160 [ 1.352368] Code: [ 1.352370] 00 [ 1.352372] 8b [ 1.352374] be [ 1.352376] 54 [ 1.352378] 02 [ 1.352379] 00 [ 1.352381] 00 [ 1.352383] 8d [ 1.352385] 96 [ 1.352387] 60 [ 1.352389] 02 [ 1.352390] 00 [ 1.352392] 00 [ 1.352394] 85 [ 1.352396] ff [ 1.352398] 74 [ 1.352400] 1a [ 1.352402] 8b [ 1.352403] 56 [ 1.352405] 08 [ 1.352407] 8b [ 1.352409] 4a [ 1.352410] 10 [ 1.352412] 89 [ 1.352414] ca [ 1.352415] 83 [ 1.352417] e2 [ 1.352419] 1f [ 1.352420] c1 [ 1.352422] e9 [ 1.352424] 05 [ 1.352426] 8d [ 1.352427] 14 [ 1.352429] 95 [ 1.352431] 24 [ 1.352432] d9 [ 1.352434] 6c [ 1.352436] c1 [ 1.352437] c1 [ 1.352439] e1 [ 1.352441] 02 [ 1.352443] 29 [ 1.352445] ca [ 1.352447] <0f> [ 1.352448] a3 [ 1.352450] 02 [ 1.352452] 19 [ 1.352454] c0 [ 1.352455] 85 [ 1.352457] c0 [ 1.352459] 0f [ 1.352461] 85 [ 1.352462] c3 [ 1.352464] 00 [ 1.352466] 00 [ 1.352467] 00 [ 1.352469] 83 [ 1.352471] 86 [ 1.352472] 00 [ 1.352474] 01 [ 1.352476] 00 [ 1.352478] 00 [ 1.352480] 01 [ 1.352481] 83 [ 1.352482] [ 1.352485] EIP: [<c107c248>] [ 1.352489] can_migrate_task+0x58/0x220 [ 1.352490] SS:ESP 0068:db27fdb0 [ 1.352493] CR2: 00000000a93c2560 [ 71.711659] ---[ end trace 0000000000000001 ]--- [ 71.711661] note: kthreadd[238] exited with preempt_count 2 [ 71.711666] WARNING: CPU: 5 PID: 238 at kernel/smp.c:293 smp_call_function_single+0xb4/0xe0() [ 71.711667] Modules linked in: [ 71.711669] CPU: 5 PID: 238 Comm: kthreadd Tainted: G D 3.18.29-rt30 #2 [ 71.711669] Hardware name: Default string Default string/HEP8225, BIOS HEPHF107 05/20/2016 [ 71.711671] 00000000 00000000 db27fb60 c16bb56f 00000000 db27fb94 c104f078 c185a058 [ 71.711673] 00000005 000000ee c18504dc 00000125 c10bf314 00000125 c10bf314 ffffffff [ 71.711675] 00000005 c110fac0 db27fba4 c104f140 00000009 00000000 db27fbc4 c10bf314 [ 71.711675] Call Trace: [ 71.711678] [<c16bb56f>] dump_stack+0x46/0x5c [ 71.711680] [<c104f078>] warn_slowpath_common+0x88/0xb0 [ 71.711681] [<c10bf314>] ? smp_call_function_single+0xb4/0xe0 [ 71.711682] [<c10bf314>] ? smp_call_function_single+0xb4/0xe0 [ 71.711685] [<c110fac0>] ? cpu_clock_event_add+0x20/0x20 [ 71.711686] [<c104f140>] warn_slowpath_null+0x20/0x30 [ 71.711687] [<c10bf314>] smp_call_function_single+0xb4/0xe0 [ 71.711689] [<c110fbc0>] ? perf_event_disable+0x90/0x90 [ 71.711691] [<c110e9fc>] task_function_call+0x3c/0x50 [ 71.711692] [<c1114fe0>] ? perf_cgroup_switch+0x1f0/0x1f0 [ 71.711694] [<c110fbdf>] perf_cgroup_exit+0x1f/0x30 [ 71.711696] [<c10cefd3>] cgroup_exit+0xb3/0x100 [ 71.711698] [<c105084a>] do_exit+0x32a/0x9c0 [ 71.711699] [<c16bab81>] ? printk+0x1c/0x1e [ 71.711702] [<c1099d8b>] ? kmsg_dump+0xcb/0xd0 [ 71.711704] [<c1005eff>] oops_end+0x8f/0xd0 [ 71.711707] [<c1041430>] no_context+0xf0/0x230 [ 71.711709] [<c1041625>] __bad_area_nosemaphore+0xb5/0x150 [ 71.711711] [<c10834cd>] ? update_sd_lb_stats+0x12d/0x3d0 [ 71.711713] [<c10416d7>] bad_area_nosemaphore+0x17/0x20 [ 71.711714] [<c1041bbb>] __do_page_fault+0x9b/0x620 [ 71.711716] [<c10837a9>] ? find_busiest_group+0x39/0x4f0 [ 71.711719] [<c1042140>] ? __do_page_fault+0x620/0x620 [ 71.711720] [<c104214b>] do_page_fault+0xb/0x10 [ 71.711721] [<c16c1e3a>] error_code+0x5a/0x60 [ 71.711723] [<c1042140>] ? __do_page_fault+0x620/0x620 [ 71.711725] [<c107c248>] ? can_migrate_task+0x58/0x220 [ 71.711726] [<c1083f81>] load_balance+0x321/0x8b0 [ 71.711729] [<c1084ad8>] pick_next_task_fair+0x5c8/0xb10 [ 71.711731] [<c1072bd1>] ? dequeue_task+0x91/0xc0 [ 71.711733] [<c16bce2a>] __schedule+0xfa/0xae0 [ 71.711734] [<c16c06c7>] ? _raw_spin_unlock_irqrestore+0x17/0x50 [ 71.711736] [<c107699b>] ? try_to_wake_up+0x5b/0x550 [ 71.711737] [<c1076f5f>] ? wake_up_state+0xf/0x20 [ 71.711738] [<c108b0cc>] ? __swait_wake_locked+0x3c/0x80 [ 71.711740] [<c1065930>] ? process_one_work+0x410/0x410 [ 71.711741] [<c16bd83b>] schedule+0x2b/0x90 [ 71.711743] [<c1069e73>] kthread+0x73/0xb0 [ 71.711744] [<c1060000>] ? SyS_olduname+0x100/0x180 [ 71.711746] [<c16c1081>] ret_from_kernel_thread+0x21/0x30 [ 71.711747] [<c1069e00>] ? kthread_worker_fn+0x160/0x160 [ 71.711748] ---[ end trace 0000000000000002 ]--- > Also if you say that the v3.18.9 based RT tree worked could please try > v3.18.13-rt10? If so then you could the git tree > > https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/ > and start a bisect between v3.18.13-rt10 and v3.18.29-rt30? Great, thx, we'll get started on this this week. Thanks again, -David >> -David >> >>>> Thanks in advance, >>>> -David > > Sebastian ��.n��������+%������w��{.n�����{�����ǫ���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f