On Mon, Jan 9, 2023 at 7:06 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jan 9, 2023 at 5:33 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: > > > > > > > > That patch survived the repro in my original post, however, the waker > > > > (rmdir) was getting stuck until a file descriptor of the epoll instance or > > > > the pressure file got closed. So, if the following modified repro runs > > > > with the patch, the waker never returns (unless the sleeper gets killed) > > > > while holding cgroup_mutex. This doesn't seem to be what you expected to > > > > see with the patch, does it? Even wake_up_all() does not appear to empty > > > > the queue, but wake_up_pollfree() does. > > > > > > Thanks for your testing. And the debugging completes. > > > > > > Mind sending a patch with wake_up_pollfree() folded? > > > > I finally had some time to look into this issue. I don't think > > delaying destruction in psi_trigger_destroy() because there are still > > users of the trigger as Hillf suggested is a good way to go. Before > > [1] correct trigger destruction was handled using a > > psi_trigger.refcount. For some reason I thought it's not needed > > anymore when we placed one-trigger-per-file restriction in that patch, > > so I removed it. Obviously that was a wrong move, so I think the > > cleanest way would be to bring back the refcounting. That way the last > > user of the trigger (either psi_trigger_poll() or psi_fop_release()) > > will free the trigger. > > I'll check once more to make sure I did not miss anything and if there > > are no objections, will post a fix. > > Uh, I recalled now why refcounting was not helpful here. I'm making > the same mistake of thinking that poll_wait() blocks until the call to > wake_up() which is not the case. Let me think if there is anything > better than wake_up_pollfree() for this case. Hi Munehisa, Sorry for the delay. I was trying to reproduce the issue but even after adding a delay before ep_remove_wait_queue() it did not happen. One thing about wake_up_pollfree() solution that does not seem right to me is this comment at https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L253: `In the very rare cases where a ->poll() implementation uses a waitqueue whose lifetime is tied to a task rather than to the 'struct file' being polled, this function must be called before the waitqueue is freed...` In our case we free the waitqueue from cgroup_pressure_release(), which is the handler for `release` operation on cgroup psi files. The other place calling psi_trigger_destroy() is psi_fop_release(), which is also tied to the lifetime to the psi files. Therefore the lifetime of the trigger's waitqueue is tied to the lifetime of the files and IIUC, we should not be required to use wake_up_pollfree(). Could you please post your .config file? I might be missing some configuration which prevents the issue from happening on my side. Thanks, Suren. > > > > > > [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@xxxxxxxxxx/ > > > > Thanks, > > Suren. > > > > > > > > Hillf