On 12/17/21 1:26 PM, Eric W. Biederman wrote: > Mike Christie <michael.christie@xxxxxxxxxx> writes: > >> The following patches made over Linus's tree, allow the vhost layer to do >> a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how >> io_uring does a copy_process against its userspace app. This allows the >> vhost layer's worker threads to inherit cgroups, namespaces, address >> space, etc and this worker thread will also be accounted for against that >> owner/parent process's RLIMIT_NPROC limit. >> >> If you are not familiar with qemu and vhost here is more detailed >> problem description: >> >> Qemu will create vhost devices in the kernel which perform network, SCSI, >> etc IO and management operations from worker threads created by the >> kthread API. Because the kthread API does a copy_process on the kthreadd >> thread, the vhost layer has to use kthread_use_mm to access the Qemu >> thread's memory and cgroup_attach_task_all to add itself to the Qemu >> thread's cgroups. >> >> The problem with this approach is that we then have to add new functions/ >> args/functionality for every thing we want to inherit. I started doing >> that here: >> >> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!eIaEe9V8mCgGU6vyvaWTKGi3Zlnz0rgk5Y-0nsBXRbsuVZsM8lYfHr8I8GRuoLYPYrOB$ >> >> for the RLIMIT_NPROC check, but it seems it might be easier to just >> inherit everything from the beginning, becuase I'd need to do something >> like that patch several times. > > I read through the code and I don't see why you want to make these > almost threads of a process not actually threads of that process > (like the io_uring threads are). > > As a separate process there are many things that will continue to be > disjoint. If the thread changes cgroups for example your new process > won't follow. > > If you want them to be actually processes with an lifetime independent > of the creating process I expect you want to reparent them to the local > init process. Just so they don't confuse the process tree. Plus init > processes know how to handle unexpected children. > > What are the semantics you are aiming for? > Hi Eric, Right now, for vhost we need the thread being created: 1. added to the caller's cgroup. 2. to share the mm struct with the caller. 3. to be accounted for under the caller's nproc rlimit value. For 1 and 2, we have cgroup_attach_task_all and get_task_mm already. This patchset started with me just trying to handle #3 by modifying kthreads like here: https://lkml.org/lkml/2021/6/23/1234 So we can use kthreads and the existing helpers and add: A. a ucounts version of the above patches in the link or B. a helper that does something like copy_process's use of is_ucounts_overlimit and vhost can call that. instead of this patchset. Before we even get to the next section below, do you consider items 1 - 3 something we need an API based on copy_process for? Do you think I should just do A or B above, or do you have another idea? If so can we get agreement on that from everyone? I thought my patches in that link were a little hacky in how they passed around the user/creds info. I thought maybe it shouldn't be passed around like that, so switched to the copy_process based approach which did everything for me. And I thought io_uring needed something similar as us so I made it generic. I don't have a preference. You and Christian are the experts, so I'll leave it to you guys. > I can see sense in generalizing some of the pieces of create_io_thread > but I think generalizing create_io_thread itself is premature. The code > lives in kernel/fork.c because it is a very special thing that we want > to keep our eyes on. > > Some of your generalization makes it much more difficult to tell what > your code is going to use because you remove hard codes that are there > to simplify the analysis of the situation. > > My gut says adding a new create_vhost_worker and putting that in > kernel/fork.c is a lot safer and will allow much better code analysis. > > If there a really are commonalities between creating a userspace process > that runs completely in the kernel and creating an additional userspace > thread we refactor the code and simplify things. > > I am especially nervous about generalizing the io_uring code as it's > signal handling just barely works, and any generalization will cause it > to break. So you are in the process of generalizing code that simply > can not handle the general case. That scares me _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization