On Tue, Sep 14, 2021, at 11:11 AM, Peter Zijlstra wrote: > On Tue, Sep 14, 2021 at 09:52:08AM -0700, Andy Lutomirski wrote: > > With a custom mapping, you don’t need to pin pages at all, I think. > > As long as you can reconstruct the contents of the shared page and > > you’re willing to do some slightly careful synchronization, you can > > detect that the page is missing when you try to update it and skip the > > update. The vm_ops->fault handler can repopulate the page the next > > time it’s accessed. > > The point is that the moment we know we need to do this user-poke, is > schedule(), which could be called while holding mmap_sem (it being a > preemptable lock). Which means we cannot go and do faults. That’s fine. The page would be in one or two states: present and writable by kernel or completely gone. If its present, the scheduler writes it. If it’s gone, the scheduler skips the write and the next fault fills it in. > > > All that being said, I feel like I’m missing something. The point of > > this is to send what the old M:N folks called “scheduler activations”, > > right? Wouldn’t it be more efficient to explicitly wake something > > blockable/pollable and write the message into a more efficient data > > structure? Polling one page per task from userspace seems like it > > will have inherently high latency due to the polling interval and will > > also have very poor locality. Or am I missing something? > > The idea was to link the user structures together in a (single) linked > list. The server structure gets a list of all the blocked tasks. This > avoids having to a full N iteration (like Java, they're talking stupid > number of N). > > Polling should not happen, once we run out of runnable tasks, the server > task gets ran again and it can instantly pick up all the blocked > notifications. > How does the server task know when to read the linked list? And what’s wrong with a ring buffer or a syscall?