On 3/25/21 8:02 AM, Jens Axboe wrote: > On 3/25/21 7:56 AM, Stefan Metzmacher wrote: >> Am 25.03.21 um 14:38 schrieb Jens Axboe: >>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote: >>>> >>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman: >>>>> Stefan Metzmacher <metze@xxxxxxxxx> writes: >>>>> >>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin: >>>>>>> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>>>>> >>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ] >>>>>>> >>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a >>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take >>>>>>> signals in general, and have no means of flushing out a stop either. >>>>>>> >>>>>>> Longer term, we may want to look into allowing stop of these threads, >>>>>>> as it relates to eg process freezing. For now, this prevents a spin >>>>>>> issue if a SIGSTOP is delivered to the parent task. >>>>>>> >>>>>>> Reported-by: Stefan Metzmacher <metze@xxxxxxxxx> >>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>>>>> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> >>>>>>> --- >>>>>>> kernel/signal.c | 3 ++- >>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c >>>>>>> index 55526b941011..00a3840f6037 100644 >>>>>>> --- a/kernel/signal.c >>>>>>> +++ b/kernel/signal.c >>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask) >>>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING)); >>>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK)); >>>>>>> >>>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING))) >>>>>>> + if (unlikely(fatal_signal_pending(task) || >>>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER)))) >>>>>>> return false; >>>>>>> >>>>>>> if (mask & JOBCTL_STOP_SIGMASK) >>>>>>> >>>>>> >>>>>> Again, why is this proposed for 5.11 and 5.10 already? >>>>> >>>>> Has the bit about the io worker kthreads been backported? >>>>> If so this isn't horrible. If not this is nonsense. >>> >>> No not yet - my plan is to do that, but not until we're 100% satisfied >>> with it. >> >> Do you understand why the patches where autoselected for 5.11 and 5.10? > > As far as I know, selections like these (AUTOSEL) are done by some bot > that uses whatever criteria to see if they should be applied for earlier > revisions. I'm sure Sasha can expand on that :-) > > Hence it's reasonable to expect that sometimes it'll pick patches that > should not go into stable, at least not just yet. It's important to > understand that this message is just a notice that it's queued up for > stable -rc, not that it's _in_ stable just yet. There's time to object. > >>>> I don't know, I hope not... >>>> >>>> But I just tested v5.12-rc4 and attaching to >>>> an application with iothreads with gdb is still not possible, >>>> it still loops forever trying to attach to the iothreads. >>> >>> I do see the looping, gdb apparently doesn't give up when it gets >>> -EPERM trying to attach to the threads. Which isn't really a kernel >>> thing, but: >> >> Maybe we need to remove the iothreads from /proc/pid/tasks/ > > Is that how it finds them? It's arguably a bug in gdb that it just > keeps retrying, but it would be nice if we can ensure that it just > ignores them. Because if gdb triggers something like that, probably > others too... > >>>> And I tested 'kill -9 $pidofiothread', and it feezed the whole >>>> machine... >>> >>> that sounds very strange, I haven't seen anything like that running >>> the exact same scenario. >>> >>>> So there's still work to do in order to get 5.12 stable. >>>> >>>> I'm short on time currently, but I hope to send more details soon. >>> >>> Thanks! I'll play with it this morning and see if I can provoke >>> something odd related to STOP/attach. >> >> Thanks! >> >> Somehow I have the impression that your same_thread_group_account patch >> may fix a lot of things... > > Maybe? I'll look closer. It needs a bit more love than that. If you have threads already in your app, then we just want to skip over the PF_IO_WORKER threads. We can't just terminate the loop. Something like the below works for me. diff --git a/fs/proc/base.c b/fs/proc/base.c index 3851bfcdba56..abff2fe10bfa 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3723,7 +3723,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos, */ pos = task = task->group_leader; do { - if (!nr--) + if (same_thread_group(task, pos) && !nr--) goto found; } while_each_thread(task, pos); fail: @@ -3744,16 +3744,22 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos, */ static struct task_struct *next_tid(struct task_struct *start) { - struct task_struct *pos = NULL; + struct task_struct *tmp, *pos = NULL; + rcu_read_lock(); - if (pid_alive(start)) { - pos = next_thread(start); - if (thread_group_leader(pos)) - pos = NULL; - else - get_task_struct(pos); + if (!pid_alive(start)) + goto no_thread; + list_for_each_entry_rcu(tmp, &start->thread_group, thread_group) { + if (!thread_group_leader(tmp) && same_thread_group(start, tmp)) { + get_task_struct(tmp); + pos = tmp; + break; + } } +no_thread: rcu_read_unlock(); + if (!pos) + return NULL; put_task_struct(start); return pos; } diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3f6a0fcaa10c..4f621e386abf 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -668,11 +668,18 @@ static inline bool thread_group_leader(struct task_struct *p) } static inline -bool same_thread_group(struct task_struct *p1, struct task_struct *p2) +bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2) { return p1->signal == p2->signal; } +static inline +bool same_thread_group(struct task_struct *p1, struct task_struct *p2) +{ + return same_thread_group_account(p1, p2) && + !((p1->flags | p2->flags) & PF_IO_WORKER); +} + static inline struct task_struct *next_thread(const struct task_struct *p) { return list_entry_rcu(p->thread_group.next, diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 5f611658eeab..625110cacc2a 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) * those pending times and rely only on values updated on tick or * other scheduler action. */ - if (same_thread_group(current, tsk)) + if (same_thread_group_account(current, tsk)) (void) task_sched_runtime(current); rcu_read_lock(); -- Jens Axboe