Jens Axboe <axboe@xxxxxxxxx> writes: > On 11/2/20 1:31 PM, Jens Axboe wrote: >> On 11/2/20 1:12 PM, Eric W. Biederman wrote: >>> Jens Axboe <axboe@xxxxxxxxx> writes: >>> >>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote: >>>>> Jens Axboe <axboe@xxxxxxxxx> writes: >>>>> >>>>>> On 10/30/20 4:22 PM, Al Viro wrote: >>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote: >>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote: >>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote: >>>>>>>>> >>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet >>>>>>>>>> and it wasn't ready. >>>>>>>>>> >>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename >>>>>>>>>> instead. The intent is not to have any functional changes in that prep >>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's >>>>>>>>>> usable from io_uring. >>>>>>>>> >>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper >>>>>>>>> threads, I hope... >>>>>>>> >>>>>>>> How so? If we have all the necessary context assigned, what's preventing >>>>>>>> it from working? >>>>>>> >>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc. >>>>>>> *do* pass through that, /dev/stdin included) >>>>>> >>>>>> Don't we just need ->thread_pid for that to work? >>>>> >>>>> No. You need ->signal. >>>>> >>>>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self >>>>> that ->thread_pid is needed. >>>>> >>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal >>>>> does not change. >>>> >>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid >>>> over the lookup operation to ensure that /proc/self/ worked the way that >>>> you'd expect. >>> >>> I understand that. >>> >>> However /proc/self/ refers to the current process not to the current >>> thread. So ->thread_pid is not what you need to assign to make that >>> happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID]. >>> >>> It will definitely break invariants to assign to ->signal. >>> >>> Currently only exchange_tids assigns ->thread_pid and it is nasty. It >>> results in code that potentially results in infinite loops in >>> kernel/signal.c >>> >>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best >>> it might work but I expect the it would completely confuse something in >>> the pid to task or pid to process mappings. Which is to say even if it >>> does work it would be an extremely fragile solution. >> >> Thanks Eric, that's useful. Sounds to me like we're better off, at least >> for now, to just expressly forbid async lookup of /proc/self/. Which >> isn't really the end of the world as far as I'm concerned. > > Alternatively, we just teach task_pid_ptr() where to look for an > alternate, if current->flags & PF_IO_WORKER is true. Then we don't have > to assign anything that's visible in task_struct, and in fact the async > worker can retain this stuff on the stack. As all requests are killed > before a task is allowed to exit, that should be safe. That seems assumes task_pid_ptr is always called on current. When you are looking at the task through the proc filesystem you want things like /proc/<pid>/stat and /proc/<pid>/status to be able to display the pids without problem. More than that it is desirable that readdir does not get the view for the PF_IO_WORKER. > diff --git a/kernel/pid.c b/kernel/pid.c > index 74ddbff1a6ba..5fd421a4864c 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -42,6 +42,7 @@ > #include <linux/sched/signal.h> > #include <linux/sched/task.h> > #include <linux/idr.h> > +#include <linux/io_uring.h> > #include <net/sock.h> > #include <uapi/linux/pidfd.h> > > @@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid); > > static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type) > { > + if ((task->flags & PF_IO_WORKER) && task->io_uring) { > + return (type == PIDTYPE_PID) ? > + &task->io_uring->thread_pid : > + &task->io_uring->pids[type]; > + } > + > return (type == PIDTYPE_PID) ? > &task->thread_pid : > &task->signal->pids[type]; The only thing I can think of that might work convincingly is to split get_current() into two functions get_context() and get_task(). Maybe accessed as current_context and current_task. With get_context() returning just a pointer to the fields that are safe to use in io_uring, and get_task returning the other fields. With exit and exec invaliding the pending work on the contexts it should be safe to just return a pointer to the context that invoked io_uring. Data in the context would either need to be read-only or be modified and read in a multi-thread safe way. The rest of the data in the task_struct by default could be assume it is only modified by the task. That would give type-safety and something avoids playing whack-a-mole with every new piece of context that userspace accesses. Eric