On Wed, Jan 19, 2022 at 1:00 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 18, 2022 at 10:19:21AM -0800, Peter Oskolkov wrote: > > > =========== signals and the general approach > > > > My version of the patchset has all of these things working. What it > > does not have, > > compared to the new approach we are discussing here, is runqueues per server > > and proper signal handling (and potential integration with proxy execution). > > > > Runqueues per server, in the LAZY mode, are easy to emulate in my patchset: > > nothing prevents the userspace to partition workers among servers, and have > > servers that "own" their workers to be pointed at by idle_server_tid_ptr. > > > > The only thing that is missing is proper treating of signals. But my patchset > > does ensure a single running worker per server, had pagefaults and preemptions > > sorted out, etc. Basically, everything works except signals. This patchet > > has issues with pagefaults, > > Already fixed pagefaults per: > > YeGvovSckivQnKX8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Could you, please, post an updated RFC when you have a chance? Thanks! > > > worker timeouts > > I still have no clear answer as to what you actually want there. > > > , worker-to-worker context > > switches (do workers move runqueues when they context switch?), etc. > > Not in kernel, if they need to be migrated, userspace needs to do that. > > > And my patchset now actually looks smaller and simpler, on the kernel side, > > that what this patchset is shaping up to be. > > > > What if I fix signals in my patchset? I think the way you deal with signals > > will work in my approach equally well; I'll also use umcg_kick() to preempt > > workers instead of sending them a signal. > > > > What do you think? > > I still absolutely hate how long you do page pinning, it *will* wreck > things like CMA which are somewhat latency critical for silly things > like Android camera apps and who knows what else. > > You've also forgotten about this: > > YcWutpu7BDeG+dQ2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > That's not optional given how you're using page-pinning. Also, I think > we need at least one direct access to the page after getting the pin in > order to make it work. > > That also very much limits it to Anon pages. I can use the same mm/page pinning strategy as you do. But then our patchsets will be quite similar, I guess, with the difference being server wakeups with RUNNING workers vs "lazy" idle_server_tid_ptr. So OK, let's continue with your approach. If you could post a new RFC with the memory/paging fixes in it, I'll then add worker timeouts, as I outlined in a separate email ~ 30min ago, and continue with my integration/testing.