On 6/21/21 7:46 PM, Steven Rostedt wrote: > On Mon, 21 Jun 2021 18:14:36 +0200 > Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote: > >>>> Yep! I tried to take the trace_type_lock here, and got the lockdep info about >>>> this problem. >>>> >>>>> The only thing I could think of is to wake up a worker thread to do the >>>>> work. That is, this just wakes the worker thread, then the worker grabs >>>>> the trace_types_lock, iterates through the cpu mask of expect running >>>>> threads, and then starts or kills them depending on the hwlat_busy >>>>> value. >>>> So, it will not wait for the kworker to run? >>> What wont wait? >> >> For example, at the shutdown, should the hotplug callback wait for the workqueue >> to run & kill the thread, or not? > > Doing that won't help the deadlock situation. yep, that is why I asked... :-( > CPU 1 CPU 2 > ----- ----- > Start shutdown > down online_cpus() > > mutex_lock(trace_types_lock); > get_online_cpus() > [BLOCK] > > wake_up_thread; > [schedule worker] > > mutex_lock(trace_types_lock); > > [ DEADLOCK ] > > > Make all access to save_cpumask and hwlat_per_cpu_data inside the > get_online_cpus() protection. (like in move_to_next_cpu(), > start_single_thread() expand the get_online_cpus()). > > Then in the cpu going down case, we can simply kill the thread and > update the save_cpumask, as it will be protected by the > get_online_cpus() code. > > That is, don't even check if hwlat_busy is set or not. Just simply do: > > > CPU_DOWN: > > stop_cpu_kthead(cpu); > > That will stop the kthread if it is running. But we should update > that function to also set per_cpu(hwlat_per_cpu_data).kthread = NULL; > Like stop_single_kthread() does. > > But for CPU_UP, we should do the work via a worker thread. > > CPU_UP: > schedule_work_on(&update_kthreads, cpu); > > Which in the work function for that update_kthreads work queue: > > mutex_lock(&trace_types_lock); > if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU) > goto out_unlock; > > get_online_cpus(); > if (!this_cpu(hwlat_per_cpu_data).kthread) > start_per_cpu_kthread(smp_processor_id()); > put_online_cpus(); > > out_unlock: > mutex_unlock(&trace_types_lock); > > Or something like that. It works! I will send a v5 with all the fixes requested, including this one. Thanks Steven! -- Daniel