On Mon, Jan 9, 2023 at 5:33 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: > > > > > > That patch survived the repro in my original post, however, the waker > > > (rmdir) was getting stuck until a file descriptor of the epoll instance or > > > the pressure file got closed. So, if the following modified repro runs > > > with the patch, the waker never returns (unless the sleeper gets killed) > > > while holding cgroup_mutex. This doesn't seem to be what you expected to > > > see with the patch, does it? Even wake_up_all() does not appear to empty > > > the queue, but wake_up_pollfree() does. > > > > Thanks for your testing. And the debugging completes. > > > > Mind sending a patch with wake_up_pollfree() folded? > > I finally had some time to look into this issue. I don't think > delaying destruction in psi_trigger_destroy() because there are still > users of the trigger as Hillf suggested is a good way to go. Before > [1] correct trigger destruction was handled using a > psi_trigger.refcount. For some reason I thought it's not needed > anymore when we placed one-trigger-per-file restriction in that patch, > so I removed it. Obviously that was a wrong move, so I think the > cleanest way would be to bring back the refcounting. That way the last > user of the trigger (either psi_trigger_poll() or psi_fop_release()) > will free the trigger. > I'll check once more to make sure I did not miss anything and if there > are no objections, will post a fix. Uh, I recalled now why refcounting was not helpful here. I'm making the same mistake of thinking that poll_wait() blocks until the call to wake_up() which is not the case. Let me think if there is anything better than wake_up_pollfree() for this case. > > [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@xxxxxxxxxx/ > > Thanks, > Suren. > > > > > Hillf