On 5/2/2018 3:43 PM, Kohli, Gaurav wrote:
On 5/2/2018 1:50 PM, Peter Zijlstra wrote:On Wed, May 02, 2018 at 10:45:52AM +0530, Kohli, Gaurav wrote:On 5/1/2018 6:49 PM, Peter Zijlstra wrote:- complete(&kthread->parked), which we can do inside schedule(); this solves the problem because then kthread_park() will not return earlyand the task really is blocked.I think complete will not help, as problem is like below : Control Thread CPUHP thread cpuhp_thread_fun Wake control thread complete(&st->done); takedown_cpu kthread_park set_bit(KTHREAD_SHOULD_PARK Here cpuhp is looping, //success case Generally when issue is not coming it schedule out by below :ht->thread_should_run(td->cpuscheduler //failure case before schedule loop check (kthread_should_park() enter here as PARKED set wake_up_process(k)If k has TASK_PARKED, then wake_up_process() which uses TASK_NORMAL will no-op, because: TASK_PARKED & TASK_NORMAL == 0__kthread_parkme complete(&self->parked); SETS RUNNING scheduleBut suppose, you do get that store, and we get to schedule with TASK_RUNNING, then schedule will no-op and we'll go around the loop and not complete.See also: lkml.kernel.org/r/20180430111744.GE4082@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxEither TASK_RUNNING gets set before we do schedule() and we go around again, re-set TASK_PARKED, resched the condition and re-call schedule(), or we schedule() first and ttwu() will not issue the TASK_RUNNING store. In either case, we'll eventually hit schedule() with TASK_PARKED. Then, and only then will the complete() happen.wait_for_completion(&kthread->parked);The point is, we'll only ever complete ^ that completion when we've scheduled out the task in TASK_PARKED state. If the task didn't get parked, no completion.Thanks for the detailed explanation, yes in all cases unpark will observe parked state only.And that is the reason I like this approach above the others. It guarantees the task really is parked when we ask for it. We don't have to deal with the task still running and getting migrated to another CPU nonsense.
HI Peter,We have tested with new patch and still seeing same issue, in this dumps we don't have debug traces, but seems there still exist race from code review , Can you please check it once:
Controller Thread CPUHP Thread takedown_cpu kthread_park kthread_parkme Set KTHREAD_SHOULD_PARK smpboot_thread_fn set Task interruptible wake_up_process Kthread_parkme SET TASK_PARKED schedule raw_spin_lock(&rq->lock) context_switch finish_lock_switch Case TASK_PARKED kthread_park_complete SET TASK_INTERRUPTIBLE And also seeing the same warning during unpark of cpuhp from controller: if (!wait_task_inactive(p, state)) { WARN_ON(1); return; } 325.065893] [<ffffff8920ed0200>] kthread_unpark+0x80/0xd8 [ 325.065902] [<ffffff8920eab754>] bringup_cpu+0xa0/0x12c [ 325.065910] [<ffffff8920eaae90>] cpuhp_invoke_callback+0xb4/0x5c8 [ 325.065917] [<ffffff8920eabd98>] cpuhp_up_callbacks+0x3c/0x154 [ 325.065924] [<ffffff8920ead220>] _cpu_up+0x134/0x208 [ 325.065931] [<ffffff8920ead45c>] do_cpu_up+0x168/0x1a0 [ 325.065938] [<ffffff8920ead4b8>] cpu_up+0x24/0x30 [ 325.065948] [<ffffff89215b1408>] cpu_subsys_online+0x20/0x2c [ 325.065956] [<ffffff89215aac64>] device_online+0x70/0xb4 [ 325.065962] [<ffffff89215aad78>] online_store+0xd0/0xdc [ 325.065971] [<ffffff89215a7424>] dev_attr_store+0x40/0x54 [ 325.065982] [<ffffff89210d8a98>] sysfs_kf_write+0x5c/0x74 [ 325.065988] [<ffffff89210d7b9c>] kernfs_fop_write+0xcc/0x1ec [ 325.065999] [<ffffff8921049288>] vfs_write+0xb4/0x1d0 [ 325.066006] [<ffffff892104a858>] SyS_write+0x60/0xc0 [ 325.066014] [<ffffff8920e83770>] el0_svc_naked+0x24/0x28 And after this same crash occured: [ 325.521307] [<ffffff8920ed4aac>] smpboot_thread_fn+0x26c/0x2c8 [ 325.527295] [<ffffff8920ecfb24>] kthread+0xf4/0x108 I will put more debug ftraces to check what is going on exactly. Regards Gaurav --Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project. -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html