On Thu, 2023-01-12 22:01:24 +0000, Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jan 9, 2023 at 7:06 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Mon, Jan 9, 2023 at 5:33 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > > On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > > > > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: > > > > > > > > > > That patch survived the repro in my original post, however, the waker > > > > > (rmdir) was getting stuck until a file descriptor of the epoll instance or > > > > > the pressure file got closed. So, if the following modified repro runs > > > > > with the patch, the waker never returns (unless the sleeper gets killed) > > > > > while holding cgroup_mutex. This doesn't seem to be what you expected to > > > > > see with the patch, does it? Even wake_up_all() does not appear to empty > > > > > the queue, but wake_up_pollfree() does. > > > > > > > > Thanks for your testing. And the debugging completes. > > > > > > > > Mind sending a patch with wake_up_pollfree() folded? > > > > > > I finally had some time to look into this issue. I don't think > > > delaying destruction in psi_trigger_destroy() because there are still > > > users of the trigger as Hillf suggested is a good way to go. Before > > > [1] correct trigger destruction was handled using a > > > psi_trigger.refcount. For some reason I thought it's not needed > > > anymore when we placed one-trigger-per-file restriction in that patch, > > > so I removed it. Obviously that was a wrong move, so I think the > > > cleanest way would be to bring back the refcounting. That way the last > > > user of the trigger (either psi_trigger_poll() or psi_fop_release()) > > > will free the trigger. > > > I'll check once more to make sure I did not miss anything and if there > > > are no objections, will post a fix. > > > > Uh, I recalled now why refcounting was not helpful here. I'm making > > the same mistake of thinking that poll_wait() blocks until the call to > > wake_up() which is not the case. Let me think if there is anything > > better than wake_up_pollfree() for this case. > > Hi Munehisa, > Sorry for the delay. I was trying to reproduce the issue but even > after adding a delay before ep_remove_wait_queue() it did not happen. Hi Suren, Thank you for your help here. Just in case, do you have KASAN enabled in your config? If not, this may just silently corrupt a certain memory location and not immediately followed by obvious messages or noticeable event like oops. > One thing about wake_up_pollfree() solution that does not seem right > to me is this comment at > https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L253: > > `In the very rare cases where a ->poll() implementation uses a > waitqueue whose lifetime is tied to a task rather than to the 'struct > file' being polled, this function must be called before the waitqueue > is freed...` > > In our case we free the waitqueue from cgroup_pressure_release(), > which is the handler for `release` operation on cgroup psi files. The > other place calling psi_trigger_destroy() is psi_fop_release(), which > is also tied to the lifetime to the psi files. Therefore the lifetime > of the trigger's waitqueue is tied to the lifetime of the files and > IIUC, we should not be required to use wake_up_pollfree(). > Could you please post your .config file? I might be missing some > configuration which prevents the issue from happening on my side. Sure, here is my config. https://gist.github.com/kamatam9/a078bdd9f695e7a0767b061c60e48d50 I confirmed that it's reliably reproducible with v6.2-rc3 as shown below. https://gist.github.com/kamatam9/096a79cf59d8ed8785c4267e917b8675 Regards, Munehisa > Thanks, > Suren. > > > > > > > > > > > [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@xxxxxxxxxx/ > > > > > > Thanks, > > > Suren. > > > > > > > > > > > Hillf > >