On Wed, 06 Dec 2023, Jens Axboe wrote: > On 12/5/23 2:58 PM, Jens Axboe wrote: > > On 12/5/23 2:28 PM, NeilBrown wrote: > >> On Tue, 05 Dec 2023, Christian Brauner wrote: > >>> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote: > >>>> On 12/4/23 2:02 PM, NeilBrown wrote: > >>>>> It isn't clear to me what _GPL is appropriate, but maybe the rules > >>>>> changed since last I looked..... are there rules? > >>>>> > >>>>> My reasoning was that the call is effectively part of the user-space > >>>>> ABI. A user-space process can call this trivially by invoking any > >>>>> system call. The user-space ABI is explicitly a boundary which the GPL > >>>>> does not cross. So it doesn't seem appropriate to prevent non-GPL > >>>>> kernel code from doing something that non-GPL user-space code can > >>>>> trivially do. > >>>> > >>>> By that reasoning, basically everything in the kernel should be non-GPL > >>>> marked. And while task_work can get used by the application, it happens > >>>> only indirectly or implicitly. So I don't think this reasoning is sound > >>>> at all, it's not an exported ABI or API by itself. > >>>> > >>>> For me, the more core of an export it is, the stronger the reason it > >>>> should be GPL. FWIW, I don't think exporting task_work functionality is > > > >>> > >>> Yeah, I'm not too fond of that part as well. I don't think we want to > >>> give modules the ability to mess with task work. This is just asking for > >>> trouble. > >>> > >> > >> Ok, maybe we need to reframe the problem then. > >> > >> Currently fput(), and hence filp_close(), take control away from kernel > >> threads in that they cannot be sure that a "close" has actually > >> completed. > >> > >> This is already a problem for nfsd. When renaming a file, nfsd needs to > >> ensure any cached "open" that it has on the file is closed (else when > >> re-exporting an NFS filesystem it can result in a silly-rename). > >> > >> nfsd currently handles this case by calling flush_delayed_fput(). I > >> suspect you are no more happy about exporting that than you are about > >> exporting task_work_run(), but this solution isn't actually 100% > >> reliable. If some other thread calls flush_delayed_fput() between nfsd > >> calling filp_close() and that same nfsd calling flush_delayed_fput(), > >> then the second flush can return before the first flush (in the other > >> thread) completes all the work it took on. > >> > >> What we really need - both for handling renames and for avoiding > >> possible memory exhaustion - is for nfsd to be able to reliably wait for > >> any fput() that it initiated to complete. > >> > >> How would you like the VFS to provide that service? > > > > Since task_work happens in the context of your task already, why not > > just have a way to get it stashed into a list when final fput is done? > > This avoids all of this "let's expose task_work" and using the task list > > for that, which seems kind of pointless as you're just going to run it > > later on manually anyway. > > > > In semi pseudo code: > > > > bool fput_put_ref(struct file *file) > > { > > return atomic_dec_and_test(&file->f_count); > > } > > > > void fput(struct file *file) > > { > > if (fput_put_ref(file)) { > > ... > > } > > } > > > > and then your nfsd_file_free() could do: > > > > ret = filp_flush(file, id); > > if (fput_put_ref(file)) > > llist_add(&file->f_llist, &l->to_free_llist); > > > > or something like that, where l->to_free_llist is where ever you'd > > otherwise punt the actual freeing to. > > Should probably have the put_ref or whatever helper also init the > task_work, and then reuse the list in the callback_head there. Then > whoever flushes it has to call ->func() and avoid exposing ____fput() to > random users. But you get the idea. Interesting ideas - thanks. So maybe the new API would be fput_queued(struct file *f, struct llist_head *q) and flush_fput_queue(struct llist_head *q) with the meaning being that fput_queued() is just like fput() except that any file needing __fput() is added to the 'q'; and that flush_fput_queue() calls __fput() on any files in 'q'. So to close a file nfsd would: fget(f); flip_close(f); fput_queued(f, &my_queue); though possibly we could have a filp_close_queued(f, q) as well. I'll try that out - but am happy to hear alternate suggestions for names :-) Thanks, NeilBrown