Hi, Many thanks for your kind response. I will put it for long run test and update. Could you please look at my below queries? 1.) I had derived and tried a patch based on the below analysis. ( I referred below open source commit, to derive on this patch. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370 ) In some cases pi_lock in rt_spin_lock_slowlock does not retain the irqs state while exiting function, this causes issue in migrate_disable() + enable as they are not symmetrical in regard to the status of interrupts. To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been modified to retain irq state by using raw_spin_lock and raw_spin_unlock and also modified wait_lock in rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore. kernel/rtmutex.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index 7cf4b8b..9c67d80 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -1191,8 +1191,6 @@ static int adaptive_wait(struct rt_mutex *lock, } #endif -# define pi_lock(lock) raw_spin_lock_irq(lock) -# define pi_unlock(lock) raw_spin_unlock_irq(lock) /* * Slow path lock function spin_lock style: this variant is very @@ -1206,14 +1204,15 @@ static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) struct task_struct *lock_owner, *self = current; struct rt_mutex_waiter waiter, *top_waiter; int ret; + unsigned long flags; rt_mutex_init_waiter(&waiter, true); - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); init_lists(lock); if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) { - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); return; } @@ -1225,10 +1224,10 @@ static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) * as well. We are serialized via pi_lock against wakeups. See * try_to_wake_up(). */ - pi_lock(&self->pi_lock); + raw_spin_lock(&self->pi_lock); self->saved_state = self->state; __set_current_state(TASK_UNINTERRUPTIBLE); - pi_unlock(&self->pi_lock); + raw_spin_unlock(&self->pi_lock); ret = task_blocks_on_rt_mutex(lock, &waiter, self, RT_MUTEX_MIN_CHAINWALK); BUG_ON(ret); @@ -1241,18 +1240,18 @@ static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) top_waiter = rt_mutex_top_waiter(lock); lock_owner = rt_mutex_owner(lock); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); debug_rt_mutex_print_deadlock(&waiter); if (top_waiter != &waiter || adaptive_wait(lock, lock_owner)) schedule_rt_mutex(lock); - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); - pi_lock(&self->pi_lock); + raw_spin_lock(&self->pi_lock); __set_current_state(TASK_UNINTERRUPTIBLE); - pi_unlock(&self->pi_lock); + raw_spin_unlock(&self->pi_lock); } /* @@ -1262,10 +1261,10 @@ static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) * happened while we were blocked. Clear saved_state so * try_to_wakeup() does not get confused. */ - pi_lock(&self->pi_lock); + raw_spin_lock(&self->pi_lock); __set_current_state(self->saved_state); self->saved_state = TASK_RUNNING; - pi_unlock(&self->pi_lock); + raw_spin_unlock(&self->pi_lock); /* * try_to_take_rt_mutex() sets the waiter bit @@ -1276,7 +1275,7 @@ static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock)); BUG_ON(!plist_node_empty(&waiter.list_entry)); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); debug_rt_mutex_free_waiter(&waiter); } -- 2.7.4 We were testing above patch on multiple targets we could experience some stuck issue on some remote target after 2 days. I am not sure what really happens there, may be the issue when try for scheduling with irq in disabled state. The systems I have tested found to be worked 7 days after that I stopped the test. 2.) With your patch during the slab allocations irqs will be in enabled state. So if we enable irqs in early stage will there be any side effects? I am sorry if my question doesn't seem to be logical. Regards, Sam On Fri, Nov 24, 2017 at 3:07 PM, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > On 2017-11-24 12:09:16 [+0530], Sam Kappen wrote: >> Hi, > Hi, > >> I am also faces a similar kind of issue on X86 target, while testing >> 3.10.105-rt119. >> The issue is seen during boot-up when USB/SCSI enumeration starts. >> >> Below is the log from my console > > Can you try if the patch I posted solves that? From the callchain it > looks like the same thing. > > Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html