Hi, I am also faces a similar kind of issue on X86 target, while testing 3.10.105-rt119. The issue is seen during boot-up when USB/SCSI enumeration starts. Below is the log from my console scsi 0:0:0:0: Direct-Access Linux scsi_debug 0004 PQ: 0 ANSI: 5 ------------[ cut here ]------------ ------------[ cut here ]------------ WARNING: at kernel/sched/core.c:3052 migrate_disable+0xee/0x100() Modules linked in: CPU: 3 PID: 7 Comm: kworker/u16:0 Not tainted 3.10.107-rt120+ #2 Hardware name: Intel Corporation S1200RP_SE/S1200RP_SE, BIOS S1200RP.86B.02.02.0005.102320140911 10/23/2014 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff880244927338 ffffffff8168b2f0 0000000000000000 0000000000000009 ffff880244927370 ffffffff8105ef8c ffff8802448fb540 0000000000000025 0000000000000004 0000000000000025 ffffffff81d9810c Call Trace: [<ffffffff8168b2f0>] dump_stack+0x4f/0x65 [<ffffffff8105ef8c>] warn_slowpath_common+0x5c/0xa0 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20 [<ffffffff8109355e>] migrate_disable+0xee/0x100 [<ffffffff810600af>] call_console_drivers.constprop.14+0x4f/0xd0 [<ffffffff81061241>] console_unlock+0x2a1/0x470 [<ffffffff810616e2>] vprintk_emit+0x2d2/0x550 [<ffffffff8168eb49>] ? _raw_spin_unlock_irqrestore+0x19/0x50 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0 [<ffffffff816892d3>] printk+0x4a/0x52 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0 [<ffffffff8105ef5a>] warn_slowpath_common+0x2a/0xa0 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20 [<ffffffff810936ce>] migrate_enable+0x15e/0x1f0 [<ffffffff810fce40>] get_page_from_freelist+0x630/0xb90 [<ffffffff8168e32a>] ? rt_spin_lock_slowlock+0x2ca/0x310 [<ffffffff810fe36d>] __alloc_pages_nodemask+0x13d/0x9e0 [<ffffffff810fce72>] ? get_page_from_freelist+0x662/0xb90 [<ffffffff81133dd0>] alloc_pages_current+0xb0/0x150 [<ffffffff81138e05>] new_slab+0x2b5/0x380 [<ffffffff8113b67a>] __slab_alloc.isra.18+0x58a/0x670 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70 [<ffffffff81133dd0>] ? alloc_pages_current+0xb0/0x150 [<ffffffff8113b956>] kmem_cache_alloc+0xd6/0x100 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70 [<ffffffff813d3f40>] scsi_pool_alloc_command+0x20/0x70 [<ffffffff813d492e>] scsi_host_alloc_command.isra.1+0x1e/0x80 [<ffffffff813d49b0>] __scsi_get_command+0x20/0xc0 [<ffffffff813d4a83>] scsi_get_command+0x33/0xc0 [<ffffffff813dad1a>] scsi_get_cmd_from_req+0x4a/0x60 [<ffffffff813db6cb>] scsi_setup_blk_pc_cmnd+0x2b/0xf0 [<ffffffff813db8fc>] scsi_prep_fn+0x3c/0x50 [<ffffffff812c9ef3>] blk_peek_request+0xf3/0x1c0 [<ffffffff813db960>] scsi_request_fn+0x50/0x570 [<ffffffff812c6c6e>] __blk_run_queue+0x2e/0x40 [<ffffffff812cdde0>] blk_execute_rq_nowait+0x70/0x100 [<ffffffff812cdef8>] blk_execute_rq+0x88/0xe0 sd 0:0:0:0: Attached scsi generic sg0 type 0 [<ffffffff812ca040>] ? blk_rq_bio_prep+0x60/0xc0 [<ffffffff812cdcf0>] ? blk_rq_map_kern+0xf0/0x170 [<ffffffff812c86c0>] ? blk_get_request+0x60/0xe0 [<ffffffff813da050>] scsi_execute+0xf0/0x150 [<ffffffff813da182>] scsi_execute_req_flags+0x82/0xf0 [<ffffffff8145d87f>] read_capacity_16+0xcf/0x520 [<ffffffff8145e060>] sd_revalidate_disk+0x350/0x1bd0 [<ffffffff8145f9a4>] sd_probe_async+0xc4/0x1d0 [<ffffffff8108e7c2>] async_run_entry_fn+0x32/0x130 [<ffffffff8107f5a5>] process_one_work+0x145/0x420 [<ffffffff81080903>] worker_thread+0x163/0x470 [<ffffffff8168d91c>] ? preempt_schedule+0x4c/0x70 [<ffffffff810807a0>] ? manage_workers.isra.7+0x2d0/0x2d0 [<ffffffff8108735f>] kthread+0xbf/0xd0 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8168f6be>] ret_from_fork+0x4e/0x80 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0 ---[ end trace 0000000000000001 ]--- ------------[ cut here ]------------ WARNING: at kernel/sched/core.c:3087 migrate_enable+0x15e/0x1f0() Modules linked in: CPU: 3 PID: 7 Comm: kworker/u16:0 Tainted: G W 3.10.1 Test case to reproduce: 1. Enable PXE boot and mount file-system on USB stick 2. Continuously reboot the system with USB stick connected 3. We generally see the issue after every 3 to 5 hours. On looking at the issue it is identified that there is some piece of code someplace that calls migrate_disable() with interrupts off, enables interrupts, then calls migrate_enable(). On instrumentation it is observed that for some SCSI layer calls(calls from get_requests) the above condition is not evaluated to true hence reaches at buffered_rmqueue with irqs in disabled state. >From the below call chain buffered_rmqueue-> local_spin_lock_irqsave -> local_lock_irqsave -> spin_lock ->rt_spin_lock -> rt_spin_lock_fastlock -> rt_spin_lock_slowlock In a normal case, when it enters rt_spin_lock_slowlock with irqs_disabled, the same is returned in below case, if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) { raw_spin_unlock(&lock->wait_lock); return; } But in the some case above condition is meet true and the control reaches below in same function, pi_lock(&self->pi_lock); self->saved_state = self->state; __set_current_state(TASK_UNINTERRUPTIBLE); pi_unlock(&self->pi_lock); pi_lock & pi_unlock disables and enables the irqs respectively, so in this special case the irq state is not retained while exiting rt_spin_lock_slowlock function and this results in the crash! Could you please help to resolve the issue. Regards, Sam On Fri, Nov 17, 2017 at 11:08 PM, Julia Cartwright <julia@xxxxxx> wrote: > On Thu, Nov 16, 2017 at 05:08:37PM +0100, Sebastian Andrzej Siewior wrote: >> + Steven & Julia >> >> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote: >> > Thanks, it works. >> >> Okay, good to hear. >> >> Steven + Julia: >> We need to decide what are going to do about this stable-wise. The bug >> was reported against 3.14.79-rt85 and the devel tree is not affected*. >> The thread starts at >> https://www.spinics.net/lists/linux-rt-users/msg17560.html > > Your proposed patch seems reasonable to me to pull back into the > relevant releases. Can you send a proper patch against the latest > affected tree (4.9?) and the stable team will pull it back? It looks > like it will need some minor massaging on it's way back, but that > shouldn't be a problem. > > Thanks, > Julia > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html