On Tue, Feb 04, 2025 at 05:05:06PM +0100, Paolo Bonzini wrote: > On Tue, Feb 4, 2025 at 3:19 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > On Mon, Jan 27, 2025 at 04:15:01PM +0100, Paolo Bonzini wrote: > > > On Mon, Jan 27, 2025 at 3:10 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > On 01/26, Linus Torvalds wrote: > > > > > On Sun, 26 Jan 2025 at 10:54, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > > > > > > > > > I don't think we even need to detect the /proc/self/ or /proc/self-thread/ > > > > > > case, next_tid() can just check same_thread_group, > > > > > > > > > > That was my thinking yes. > > > > > > > > > > If we exclude them from /proc/*/task entirely, I'd worry that it would > > > > > hide it from some management tool and be used for nefarious purposes > > > > > > > > Agreed, > > > > > > > > > (even if they then show up elsewhere that the tool wouldn't look at). > > > > > > > > Even if we move them from /proc/*/task to /proc ? > > > > > > Indeed---as long as they show up somewhere, it's not worse than it > > > used to be. The reason why I'd prefer them to stay in /proc/*/task is > > > that moving them away at least partly negates the benefits of the > > > "workers are children of the starter" model. For example it > > > complicates measuring their cost within the process that runs the VM. > > > Maybe it's more of a romantic thing than a real practical issue, > > > because in the real world resource accounting for VMs is done via > > > cgroups. But unlike the lazy creation in KVM, which is overall pretty > > > self-contained, I am afraid the ugliness in procfs would be much worse > > > compared to the benefit, if there's a benefit at all. > > > > > > > Perhaps, I honestly do not know what will/can confuse userspace more. > > > > > > At the very least, marking workers as "Kthread: 1" makes sense and You mean in /proc/<pid>/status? Yeah, we can do that. This expands the definition of Kthread a bit. It would now mean anything that the kernel spawned for userspace. But that is probably fine. But it won't help with the problem of just checking /proc/<pid>/task/ to figure out whether the caller is single-threaded or not. If the caller has more than 1 entry in there they need to walk through all of them and parse through /proc/<pid>/status to discount them if they're kernel threads. > > > should not cause too much confusion. I wouldn't go beyond that unless > > > we get more reports of similar issues, and I'm not even sure how > > > common it is for userspace libraries to check for single-threadedness. > > > > Sorry, just saw this thread now. > > > > What if we did what Linus suggests and hide (odd) user workers from > > /proc/<pid>/task/* but also added /proc/<pid>/workers/*. The latter > > would only list the workers that got spawned by the kernel for that > > particular task? This would acknowledge their somewhat special status > > and allow userspace to still detect them as "Hey, I didn't actually > > spawn those, they got shoved into my workload by the kernel for me.". > > Wouldn't the workers then disappear completely from ps, top or other > tools that look at /proc/$PID/task? That seems a bit too underhanded > towards userspace... So maybe, but then there's also the possibility to do: - Have /proc/<pid>/status list all tasks. - Have /proc/<pid>/worker only list user workers spawned by the kernel for userspace. count(/proc/<pid>/status) - count(/proc/<pid>/workers) == 1 => (userspace) single threaded My wider point is that I would prefer we add something that is consistent and doesn't have to give the caller a different view than a foreign task. I think that will just create confusion in the long run. Btw, checking whether single-threaded this way: fn is_single_threaded() -> io::Result<bool> { match count_dir_entries("/proc/self/task") { Ok(1) => Ok(true), Ok(_) => Ok(false), Err(e) => Err(e), } } can be simplified. It should be sufficient to do: stat("/proc/self/task", &st); if ((st->st_nlink - 2) == 1) // single threaded Since procfs adds the number of tasks to st_nlink (Which is a bit weird given that /proc/<pid>/fd puts the number of file descriptors in st->st_size.).