On 10/1/20 9:19 AM, Thomas Gleixner wrote: > On Thu, Oct 01 2020 at 08:29, Jens Axboe wrote: >> This adds TIF_TASKWORK for x86, which if set, will return true on >> checking for pending signals. That in turn causes tasks to restart the >> system call, which will run the added task_work. > > Huch? The syscall restart does not cause the task work to run. Yeah that's poorly worded, I'll make that more accurate. >> If TIF_TASKWORK is available, we'll use that for notification when >> TWA_SIGNAL is specified. If it isn't available, the existing >> TIF_SIGPENDING path is used. > > Bah. Yet another TIF flag just because. ... >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1767,7 +1767,7 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb, >> notify = TWA_SIGNAL; >> >> ret = task_work_add(tsk, cb, notify); >> - if (!ret) >> + if (!ret && !notify) > > !notify assumes that TWA_RESUME == 0. Fun to debug if that ever changes. Agree, I'll make that if (!ret && notify != TWA_SIGNAL) instead, that's more sane. >> wake_up_process(tsk); >> --- a/kernel/task_work.c >> +++ b/kernel/task_work.c >> @@ -28,7 +28,6 @@ int >> task_work_add(struct task_struct *task, struct callback_head *work, int notify) >> { >> struct callback_head *head; >> - unsigned long flags; >> >> do { >> head = READ_ONCE(task->task_works); >> @@ -41,7 +40,10 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) >> case TWA_RESUME: >> set_notify_resume(task); >> break; >> - case TWA_SIGNAL: >> + case TWA_SIGNAL: { >> +#ifndef TIF_TASKWORK >> + unsigned long flags; >> + >> /* >> * Only grab the sighand lock if we don't already have some >> * task_work pending. This pairs with the smp_store_mb() >> @@ -53,7 +55,12 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) >> signal_wake_up(task, 0); >> unlock_task_sighand(task, &flags); >> } >> +#else >> + set_tsk_thread_flag(task, TIF_TASKWORK); >> + wake_up_process(task); >> +#endif > > This is really a hack. TWA_SIGNAL is a misnomer with the new > functionality and combined with the above > > if (!ret && !notify) > wake_up_process(tsk); > > there is not really a big difference between TWA_RESUME and TWA_SIGNAL > anymore. Just the delivery mode and the syscall restart magic. Agree, maybe it'd make more sense to rename TWA_SIGNAL to TWA_RESTART or something like that. The only user of this is io_uring, so it's not like it's a lot of churn to do so. >> static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, >> unsigned long ti_work) >> { >> + bool restart_sys = false; >> + >> /* >> * Before returning to user space ensure that all pending work >> * items have been completed. >> @@ -157,8 +159,13 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, >> if (ti_work & _TIF_PATCH_PENDING) >> klp_update_patch_state(current); >> >> + if (ti_work & _TIF_TASKWORK) { >> + task_work_run(); >> + restart_sys = true; >> + } >> + >> if (ti_work & _TIF_SIGPENDING) >> - arch_do_signal(regs); >> + restart_sys |= !arch_do_signal(regs); >> >> if (ti_work & _TIF_NOTIFY_RESUME) { >> clear_thread_flag(TIF_NOTIFY_RESUME); >> @@ -178,6 +185,9 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, >> ti_work = READ_ONCE(current_thread_info()->flags); >> } >> >> + if (restart_sys) >> + arch_restart_syscall(regs); >> + > > How is that supposed to work? > > Assume that both TIF_TASKWORK and TIF_SIGPENDING are set, i.e. after > running task work and requesting syscall restart there is an actual > signal to be delivered. (Also see v2 for the restart change) > This needs a lot more thoughts. Definitely, which is why I'm posting it as an RFC. It fixes a real performance regression, and there's no reliable way to use TWA_RESUME that I can tell. What kind of restart behavior do we need? Before this change, everytime _TIF_SIGPENDING is set and we don't deliver a signal in the loop, we go through the syscall restart code. After this change, we only do so at the end. I'm assuming that's your objection? For _TIF_TASKWORK, we'll always want to restat the system call, if we were currently doing one. For signals, only if we didn't deliver a signal. So we'll want to retain the restart inside signal delivery? -- Jens Axboe