On Fri, May 05, 2023 at 11:32:11AM -0700, Anish Moorthy wrote: > Peter, I'm afraid that isolating cores and splitting them into groups > is new to me. Do you mind explaining exactly what you did here? So far I think the most important pinning is the vcpu thread pinning, we should test always with that in this case to avoid the vcpu load overhead not scaling with cores/vcpus. What I did was (1) isolate cores (using isolcpus=xxx), then (2) manually pinning the userfault threads to some other isolated cores. But maybe this is not needed. > > Also, I finally got some of my own perf traces for the self test: [1] > shows what happens with 32 vCPUs faulting on a single uffd with 32 > reader threads, with the contention clearly being a huge issue, and > [2] shows the effect of demand paging through memory faults on that > configuration. Unfortunately the export-to-svg functionality on our > internal tool seems broken, so I could only grab pngs :( > > [1] https://drive.google.com/file/d/1YWiZTjb2FPmqj0tkbk4cuH0Oq8l65nsU/view?usp=drivesdk > [2] https://drive.google.com/file/d/1P76_6SSAHpLxNgDAErSwRmXBLkuDeFoA/view?usp=drivesdk Understood. What I tested was without -a so it's using >1 uffds. I explained why I think it could be useful to test this in my reply to Nadav, do you think it makes sense to you? e.g. compare (1) 32 vcpus + 32 uffd threads and (2) 64 vcpus + 64 uffd threads, again we need to make sure vcpu threads are pinned using -c this time. It'll be nice to pin the uffd threads too but I'm not sure whether it'll make a huge difference. Thanks, -- Peter Xu