Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file

Tejun Heo <tj@xxxxxxxxxx> · Tue, 27 Jun 2023 15:54:22 -1000

Hello,

On Tue, Jun 27, 2023 at 02:58:08PM -0700, Suren Baghdasaryan wrote:
> Ok in kernfs_generic_poll() we are using kernfs_open_node.poll
> waitqueue head for polling and kernfs_open_node is freed from inside
> kernfs_unlink_open_file() which is called from kernfs_fop_release().
> So, it is destroyed only when the last fput() is done, unlike the
> ops->release() operation which we are using for destroying PSI
> trigger's waitqueue. So, it seems we still need an operation which
> would indicate that the file is truly going away.

If we want to stay consistent with how kernfs behaves w.r.t. severing, the
right thing to do would be preventing any future polling at severing and
waking up everyone currently waiting, which sounds fine from cgroup behavior
POV too.

Now, the challenge is designing an interface which is difficult to make
mistake with. IOW, it'd be great if kernfs wraps poll call so that severing
is implemented without kernfs users doing anything, or at least make it
pretty obvious what the correct usage pattern is.

> Christian's suggestion to rename current ops->release() operation into
> ops->drain() (or ops->flush() per Matthew's request) and introduce a
> "new" ops->release() which is called only when the last fput() is done
> seems sane to me. Would everyone be happy with that approach?

I'm not sure I'd go there. The contract is that once ->release() is called,
the code backing that file can go away (e.g. rmmod'd). It really should
behave just like the last put from kernfs users' POV. For this specific fix,
it's safe because we know the ops is always built into the kernel and won't
go away but it'd be really bad if the interface says "this is a normal thing
to do". We'd be calling into rmmod'd text pages in no time.

So, I mean, even for temporary fix, we have to make it abundantly clear that
this is not for usual usage and can only be used if the code backing the ops
is built into the kernel and so on.

Thanks.

-- 
tejun