"Dr. Thomas Orgis" <thomas.orgis@xxxxxxxxxxxxxx> writes: > Am Tue, 22 Feb 2022 17:53:12 -0600 > schrieb "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>: > >> How do you figure? > > I admit that I am struggling with understanding where exit codes come > from in the non-usual cases. During my taskstats tests, I played with > writing a multithreaded application that does call pthread_exit() in > the main thread (pid==tgid), for example. I slowly had to learn just > how messy this can be … > > Is it clearly defined what the exitcode of a task as part of a process > is/should/can mean, as opposed to the process as a whole? In the code it is clearly defined. The decoding is exactly the same as from an entire process and for a single threaded process there is no difference. Linux has a system 2 system calls "exit(2)" and "exit_group(2)" if a thread exits by itself whatever is passed to exit(2) is the exit code. What pthread_exit passes to exit(2) I don't know. I have not been able to trace glibc that far, and I have not instrumented up a kernel to see. For threads that are alive when exit_group(2) is called they all get the same final exit code. >> For single-threaded processes ac_exitcode would always be reasonable, >> and be what userspace passed to exit(3). > > Yes. That is the one case where we all know what we are dealing with;-) > >> For multi-threaded processes ac_exitcode before my change was set to >> some completely arbitrary value for the thread whose tgid == tid. > > Isn't the only place where it really makes sense to set the exitcode > when the last task of the process exits? I guess that was the intention > of the earlier code — with the same wrong assumption that I fell victim > to for quite some time: That the group leader (first task, tgid == pid) > always exits last. > > I do not know in which cases group member threads have meaningful exit > codes different from the last one (which is the one returned for the > process in whole … ?). I'd love to see the exact reasoning on how > multithreading got mapped into kernel tasks which used to track only > single-threaded processes before. The internal model in the kernel is there are tasks (which pthreads are mapped to in a 1-1 fashion). These tasks were the original process abstraction. In the case of CLONE_THREAD these tasks are glued together into a POSIX process, with shared signal handling. So from a kernel standpoint as it basically the original process abstraction it is all well defined what happens when an individual task exits. >> With my change the value returned >> is at least well defined. > > But defined to what? See above. >> Now maybe it would have been better to flag the bug fix with a version >> number. Unfortunately I did not even realize taskstats had a version >> number. I just know the code made no sense. > > Well, fixing a bug that has been there from the beginning (of adding > multithreading, at least) is a significant change that one might want > to know about. And I do think that it fits to thouroughly fix these > issues that relate to identifying threads and processes (the shameless > plug of my taskstats patch that I'm working on since 2018, and only got > right in 2022, finally — I hope), while at that. It looks like the bug was in commit f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats") in 2006 in 2.6.19-rc1 when taskstats were added. That is long after CLONE_THREAD support was added in the 2.5 development kernel. I have been working to get a single place that code can look to find the process exit status. AKA so that the code can always set SIGNAL_GROUP_EXIT, and look at signal->group_exit_code. Fixing this was just part of sorting out the misconceptions, and I didn't realize there was anyone that paying attention and cared. I will see if I can find some time to give your taskstats patch a review. Eric