On Fri, Sep 10, 2021 at 10:55 AM Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > ----- On Sep 10, 2021, at 1:48 PM, Peter Oskolkov posk@xxxxxxxxxx wrote: > > > On Fri, Sep 10, 2021 at 10:33 AM Mathieu Desnoyers > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > >> > >> ----- On Sep 10, 2021, at 12:37 PM, Florian Weimer fweimer@xxxxxxxxxx wrote: > >> > >> > * Peter Oskolkov: > >> > > >> >> In short, due to the need to read/write to the userspace from > >> >> non-sleepable contexts in the kernel it seems that we need to have some > >> >> form of per task/thread kernel/userspace shared memory that is pinned, > >> >> similar to what your sys_task_getshared does. > >> > > >> > In glibc, we'd also like to have this for PID and TID. Eventually, > >> > rt_sigprocmask without kernel roundtrip in most cases would be very nice > >> > as well. For performance and simplicity in userspace, it would be best > >> > if the memory region could be at the same offset from the TCB for all > >> > threads. > >> > > >> > For KTLS, the idea was that the auxiliary vector would contain size and > >> > alignment of the KTLS. Userspace would reserve that memory, register it > >> > with the kernel like rseq (or the robust list pointers), and pass its > >> > address to the vDSO functions that need them. The last part ensures > >> > that the vDSO functions do not need non-global data to determine the > >> > offset from the TCB. Registration is still needed for the caches. > >> > > >> > I think previous discussions (in the KTLS and rseq context) did not have > >> > the pinning constraint. > >> > >> If this data is per-thread, and read from user-space, why is it relevant > >> to update this data from non-sleepable kernel context rather than update it as > >> needed on return-to-userspace ? When returning to userspace, sleeping due to a > >> page fault is entirely acceptable. This is what we currently do for rseq. > >> > >> In short, the data could be accessible from the task struct. Flags in the > >> task struct can let return-to-userspace know that it has outdated ktls > >> data. So before returning to userspace, the kernel can copy the relevant data > >> from the task struct to the shared memory area, without requiring any pinning. > >> > >> What am I missing ? > > > > I can't speak about other use cases, but in the context of userspace > > scheduling, the information that a task has blocked in the kernel and > > is going to be removed from its runqueue cannot wait to be delivered > > to the userspace until the task wakes up, as the userspace scheduler > > needs to know of the even when it happened so that it can schedule > > another task in place of the blocked one. See the discussion here: > > > > https://lore.kernel.org/lkml/CAG48ez0mgCXpXnqAUsa0TcFBPjrid-74Gj=xG8HZqj2n+OPoKw@xxxxxxxxxxxxxx/ > > OK, just to confirm my understanding, so the use-case here is per-thread > state which can be read by other threads (in this case the userspace scheduler) ? Yes, exactly! And sometimes these other threads have to read/write the state while they are themselves in preempt_disabled regions in the kernel. There could be a way to do that asynchronously (e.g. via workpools), but this will add latency and complexity. > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com