Am 25.03.21 um 21:55 schrieb Eric W. Biederman: > Oleg Nesterov <oleg@xxxxxxxxxx> writes: > >> On 03/25, Linus Torvalds wrote: >>> >>> The whole "signals are very special for IO threads" thing has caused >>> so many problems, that maybe the solution is simply to _not_ make them >>> special? >> >> Or may be IO threads should not abuse CLONE_THREAD? >> >> Why does create_io_thread() abuse CLONE_THREAD ? >> >> One reason (I think) is that this implies SIGKILL when the process exits/execs, >> anything else? > > A lot. > > The io workers perform work on behave of the ordinary userspace threads. > Some of that work is opening files. For things like rlimits to work > properly you need to share the signal_struct. But odds are if you find > anything in signal_struct (not counting signals) there will be an > io_uring code path that can exercise it as io_uring can traverse the > filesystem, open files and read/write files. So io_uring can exercise > all of proc. > > Using create_io_thread with CLONE_THREAD is the least problematic way > (including all of the signal and ptrace problems we are looking at right > now) to implement the io worker threads. > > They _really_ are threads of the process that just never execute any > code in userspace. So they should look like a userspace thread sitting in something like epoll_pwait() with all signals blocked, which will never return to userspace again? I think that would be useful, but I also think that userspace should see: - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread]) - /proc/$tidofiothread/exe as symlink to that not exists - all of /proc/$tidofiothread/ shows root.root as owner and group and things which still allow write access to /proc/$tidofiothread/comm similar things with rw permissions should still disallow modifications: For the other kernel threads e.g. "[cryptd]" I see the following: LANG=C ls -l /proc/653 | grep rw ls: cannot read symbolic link '/proc/653/exe': No such file or directory -rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup -rw-r--r-- 1 root root 0 Mar 25 22:09 comm -rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> / lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe -rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map -rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid -rw------- 1 root root 0 Mar 25 22:09 mem -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj -rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> / -rw-r--r-- 1 root root 0 Mar 25 22:09 sched -rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups -rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets -rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns -rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map And this: LANG=C echo "bla" > /proc/653/comm -bash: echo: write error: Invalid argument LANG=C echo "bla" > /proc/653/gid_map -bash: echo: write error: Operation not permitted Can't we do the same for iothreads regarding /proc? Just make things read only there and empty "cmdline"/"exe"? Maybe I'm too naive, but that what I'd assume as a userspace developer/admin. Does at least parts of it make any sense? metze