Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work_run()

"NeilBrown" <neilb@xxxxxxx> · Wed, 06 Dec 2023 10:23:37 +1100

On Wed, 06 Dec 2023, NeilBrown wrote:
> On Wed, 06 Dec 2023, Jens Axboe wrote:
> > On 12/5/23 2:58 PM, Jens Axboe wrote:
> > > On 12/5/23 2:28 PM, NeilBrown wrote:
> > >> On Tue, 05 Dec 2023, Christian Brauner wrote:
> > >>> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote:
> > >>>> On 12/4/23 2:02 PM, NeilBrown wrote:
> > >>>>> It isn't clear to me what _GPL is appropriate, but maybe the rules
> > >>>>> changed since last I looked..... are there rules?
> > >>>>>
> > >>>>> My reasoning was that the call is effectively part of the user-space
> > >>>>> ABI.  A user-space process can call this trivially by invoking any
> > >>>>> system call.  The user-space ABI is explicitly a boundary which the GPL
> > >>>>> does not cross.  So it doesn't seem appropriate to prevent non-GPL
> > >>>>> kernel code from doing something that non-GPL user-space code can
> > >>>>> trivially do.
> > >>>>
> > >>>> By that reasoning, basically everything in the kernel should be non-GPL
> > >>>> marked. And while task_work can get used by the application, it happens
> > >>>> only indirectly or implicitly. So I don't think this reasoning is sound
> > >>>> at all, it's not an exported ABI or API by itself.
> > >>>>
> > >>>> For me, the more core of an export it is, the stronger the reason it
> > >>>> should be GPL. FWIW, I don't think exporting task_work functionality is
> > > 
> > >>>
> > >>> Yeah, I'm not too fond of that part as well. I don't think we want to
> > >>> give modules the ability to mess with task work. This is just asking for
> > >>> trouble.
> > >>>
> > >>
> > >> Ok, maybe we need to reframe the problem then.
> > >>
> > >> Currently fput(), and hence filp_close(), take control away from kernel
> > >> threads in that they cannot be sure that a "close" has actually
> > >> completed.
> > >>
> > >> This is already a problem for nfsd.  When renaming a file, nfsd needs to
> > >> ensure any cached "open" that it has on the file is closed (else when
> > >> re-exporting an NFS filesystem it can result in a silly-rename).
> > >>
> > >> nfsd currently handles this case by calling flush_delayed_fput().  I
> > >> suspect you are no more happy about exporting that than you are about
> > >> exporting task_work_run(), but this solution isn't actually 100%
> > >> reliable.  If some other thread calls flush_delayed_fput() between nfsd
> > >> calling filp_close() and that same nfsd calling flush_delayed_fput(),
> > >> then the second flush can return before the first flush (in the other
> > >> thread) completes all the work it took on.
> > >>
> > >> What we really need - both for handling renames and for avoiding
> > >> possible memory exhaustion - is for nfsd to be able to reliably wait for
> > >> any fput() that it initiated to complete.
> > >>
> > >> How would you like the VFS to provide that service?
> > > 
> > > Since task_work happens in the context of your task already, why not
> > > just have a way to get it stashed into a list when final fput is done?
> > > This avoids all of this "let's expose task_work" and using the task list
> > > for that, which seems kind of pointless as you're just going to run it
> > > later on manually anyway.
> > > 
> > > In semi pseudo code:
> > > 
> > > bool fput_put_ref(struct file *file)
> > > {
> > > 	return atomic_dec_and_test(&file->f_count);
> > > }
> > > 
> > > void fput(struct file *file)
> > > {
> > > 	if (fput_put_ref(file)) {
> > > 		...
> > > 	}
> > > }
> > > 
> > > and then your nfsd_file_free() could do:
> > > 
> > > ret = filp_flush(file, id);
> > > if (fput_put_ref(file))
> > > 	llist_add(&file->f_llist, &l->to_free_llist);
> > > 
> > > or something like that, where l->to_free_llist is where ever you'd
> > > otherwise punt the actual freeing to.
> > 
> > Should probably have the put_ref or whatever helper also init the
> > task_work, and then reuse the list in the callback_head there. Then
> > whoever flushes it has to call ->func() and avoid exposing ____fput() to
> > random users. But you get the idea.
> 
> Interesting ideas - thanks.
> 
> So maybe the new API would be
> 
>  fput_queued(struct file *f, struct llist_head *q)
> and
>  flush_fput_queue(struct llist_head *q)
> 
> with the meaning being that fput_queued() is just like fput() except
> that any file needing __fput() is added to the 'q'; and that
> flush_fput_queue() calls __fput() on any files in 'q'.
> 
> So to close a file nfsd would:
> 
>   fget(f);
>   flip_close(f);
>   fput_queued(f, &my_queue);
> 
> though possibly we could have a
>   filp_close_queued(f, q)
> as well.
> 
> I'll try that out - but am happy to hear alternate suggestions for names :-)
> 

Actually ....  I'm beginning to wonder if we should just use
__fput_sync() in nfsd.
It has a big warning about not doing that blindly, but the detail in the
warning doesn't seem to apply to nfsd...

NeilBrown