I know that you said schedule_work is not NMI save, which is the first issue. Perhaps it can be fixed using irq_work_queue. But even if irq_work_queue is used to implement it, there will still be a deadlock problem because slave cpu1 still has not released the running queue lock of master CPU0.