On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: > > > > That patch survived the repro in my original post, however, the waker > > (rmdir) was getting stuck until a file descriptor of the epoll instance or > > the pressure file got closed. So, if the following modified repro runs > > with the patch, the waker never returns (unless the sleeper gets killed) > > while holding cgroup_mutex. This doesn't seem to be what you expected to > > see with the patch, does it? Even wake_up_all() does not appear to empty > > the queue, but wake_up_pollfree() does. > > Thanks for your testing. And the debugging completes. > > Mind sending a patch with wake_up_pollfree() folded? I finally had some time to look into this issue. I don't think delaying destruction in psi_trigger_destroy() because there are still users of the trigger as Hillf suggested is a good way to go. Before [1] correct trigger destruction was handled using a psi_trigger.refcount. For some reason I thought it's not needed anymore when we placed one-trigger-per-file restriction in that patch, so I removed it. Obviously that was a wrong move, so I think the cleanest way would be to bring back the refcounting. That way the last user of the trigger (either psi_trigger_poll() or psi_fop_release()) will free the trigger. I'll check once more to make sure I did not miss anything and if there are no objections, will post a fix. [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@xxxxxxxxxx/ Thanks, Suren. > > Hillf