Re: another use-after-free in ep_remove_wait_queue()

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Mon, 9 Jan 2023 17:33:45 -0800

On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote:
> >
> > That patch survived the repro in my original post, however, the waker
> > (rmdir) was getting stuck until a file descriptor of the epoll instance or
> > the pressure file got closed. So, if the following modified repro runs
> > with the patch, the waker never returns (unless the sleeper gets killed)
> > while holding cgroup_mutex. This doesn't seem to be what you expected to
> > see with the patch, does it? Even wake_up_all() does not appear to empty
> > the queue, but wake_up_pollfree() does.
>
> Thanks for your testing. And the debugging completes.
>
> Mind sending a patch with wake_up_pollfree() folded?

I finally had some time to look into this issue. I don't think
delaying destruction in psi_trigger_destroy() because there are still
users of the trigger as Hillf suggested is a good way to go. Before
[1] correct trigger destruction was handled using a
psi_trigger.refcount. For some reason I thought it's not needed
anymore when we placed one-trigger-per-file restriction in that patch,
so I removed it. Obviously that was a wrong move, so I think the
cleanest way would be to bring back the refcounting. That way the last
user of the trigger (either psi_trigger_poll() or psi_fop_release())
will free the trigger.
I'll check once more to make sure I did not miss anything and if there
are no objections, will post a fix.

[1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@xxxxxxxxxx/

Thanks,
Suren.

>
> Hillf