On 12/5/23 2:58 PM, Jens Axboe wrote: > On 12/5/23 2:28 PM, NeilBrown wrote: >> On Tue, 05 Dec 2023, Christian Brauner wrote: >>> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote: >>>> On 12/4/23 2:02 PM, NeilBrown wrote: >>>>> It isn't clear to me what _GPL is appropriate, but maybe the rules >>>>> changed since last I looked..... are there rules? >>>>> >>>>> My reasoning was that the call is effectively part of the user-space >>>>> ABI. A user-space process can call this trivially by invoking any >>>>> system call. The user-space ABI is explicitly a boundary which the GPL >>>>> does not cross. So it doesn't seem appropriate to prevent non-GPL >>>>> kernel code from doing something that non-GPL user-space code can >>>>> trivially do. >>>> >>>> By that reasoning, basically everything in the kernel should be non-GPL >>>> marked. And while task_work can get used by the application, it happens >>>> only indirectly or implicitly. So I don't think this reasoning is sound >>>> at all, it's not an exported ABI or API by itself. >>>> >>>> For me, the more core of an export it is, the stronger the reason it >>>> should be GPL. FWIW, I don't think exporting task_work functionality is > >>> >>> Yeah, I'm not too fond of that part as well. I don't think we want to >>> give modules the ability to mess with task work. This is just asking for >>> trouble. >>> >> >> Ok, maybe we need to reframe the problem then. >> >> Currently fput(), and hence filp_close(), take control away from kernel >> threads in that they cannot be sure that a "close" has actually >> completed. >> >> This is already a problem for nfsd. When renaming a file, nfsd needs to >> ensure any cached "open" that it has on the file is closed (else when >> re-exporting an NFS filesystem it can result in a silly-rename). >> >> nfsd currently handles this case by calling flush_delayed_fput(). I >> suspect you are no more happy about exporting that than you are about >> exporting task_work_run(), but this solution isn't actually 100% >> reliable. If some other thread calls flush_delayed_fput() between nfsd >> calling filp_close() and that same nfsd calling flush_delayed_fput(), >> then the second flush can return before the first flush (in the other >> thread) completes all the work it took on. >> >> What we really need - both for handling renames and for avoiding >> possible memory exhaustion - is for nfsd to be able to reliably wait for >> any fput() that it initiated to complete. >> >> How would you like the VFS to provide that service? > > Since task_work happens in the context of your task already, why not > just have a way to get it stashed into a list when final fput is done? > This avoids all of this "let's expose task_work" and using the task list > for that, which seems kind of pointless as you're just going to run it > later on manually anyway. > > In semi pseudo code: > > bool fput_put_ref(struct file *file) > { > return atomic_dec_and_test(&file->f_count); > } > > void fput(struct file *file) > { > if (fput_put_ref(file)) { > ... > } > } > > and then your nfsd_file_free() could do: > > ret = filp_flush(file, id); > if (fput_put_ref(file)) > llist_add(&file->f_llist, &l->to_free_llist); > > or something like that, where l->to_free_llist is where ever you'd > otherwise punt the actual freeing to. Should probably have the put_ref or whatever helper also init the task_work, and then reuse the list in the callback_head there. Then whoever flushes it has to call ->func() and avoid exposing ____fput() to random users. But you get the idea. -- Jens Axboe