On Fri, May 12, 2017 at 11:57:15AM -0500, Christoph Lameter wrote: > On Fri, 12 May 2017, Marcelo Tosatti wrote: > > > > What exactly is the issue you are seeing and want to address? I think we > > > have similar aims and as far as I know the current situation is already > > > good enough for what you may need. You may just not be aware of how to > > > configure this. > > > > I want to disable vmstat worker thread completly from an isolated CPU. > > Because it adds overhead to a latency target, target which > > the lower the better. > > NOHZ already does that. I wanted to know what your problem is that you > see. The latency issue has already been solved as far as I can tell . > Please tell me why the existing solutions are not sufficient for you. We don't want vmstat_worker to execute on a given CPU, even if the local CPU updates vm-statistics. Because: vmstat_worker increases latency of the application (i can measure it if you want on a given CPU, how many ns's the following takes: schedule_out(qemu-kvm-vcpu) schedule_in(kworker_thread) execute function to drain local vmstat counters to global counters schedule_out(kworker_thread) schedule_in(qemu-kvm-vcpu) x86 instruction to enter guest. (*) But you can see right away without numbers that the sequence above is not desired. Why the existing solutions are not sufficient: 1) task-isolation patchset seems too heavy for our usecase (we do want IPIs, signals, etc). 2) With upstream linux-2.6.git, if dpdk running inside a guest happens to trigger any vmstat update (say for example migration), we want the statistics transferred directly from the point where they are generated, and not the sequence (*). > > > I doubt that doing inline updates will do much good compared to what we > > > already have and what the dataplan mode can do. > > > > Can the dataplan mode disable vmstat worker thread completly on a given > > CPU? > > That already occurs when you call quiet_vmstat() and is used by the NOHZ > logic. Configure that correctly and you should be fine. quiet_vmstat() is not called by anyone today (upstream code). Are you talking about task isolation patches? Those seem a little heavy to me, for example: 1) "Each time through the loop of TIF work to do, if TIF_TASK_ISOLATION is set, we call the new task_isolation_enter() routine. This takes any actions that might avoid a future interrupt to the core, such as a worker thread being scheduled that could be quiesced now (e.g. the vmstat worker) or a future IPI to the core to clean up some state that could be cleaned up now (e.g. the mm lru per-cpu cache). In addition, it reqeusts rescheduling if the scheduler dyntick is still running." For example, what about static void do_sync_core(void *data) on_each_cpu(do_sync_core, NULL, 1); You can't enable tracing with this feature? "Prior to returning to userspace, isolated tasks will arrange that no future kernel activity will interrupt the task while the task is running in userspace. By default, attempting to re-enter the kernel while in this mode will cause the task to be terminated with a signal; you must explicitly use prctl() to disable task isolation before resuming normal use of the kernel." 2) A qemu-kvm-vcpu thread, process which runs on the host system, executes guest code through ioctl(KVM_RUN) --> vcpu_enter_guest --> x86 instruction to execute guest code. So the "isolation period where task does not want to be interrupted" contains kernel code. 3) Before using any service of the operating system, through a syscall, the application has to clear the TIF_TASK_ISOLATION flag, then do the syscall, and when returning to userspace, setting it again. Now what guarantees regarding low amount of interrupts do you provide while this task is in kernel mode? 4) "We also support a new "task_isolation_debug" flag which forces the console stack to be dumped out regardless. We try to catch the original source of the interrupt, e.g. if an IPI is dispatched to a task-isolation task, we dump the backtrace of the remote core that is sending the IPI, rather than just dumping out a trace showing the core received an IPI from somewhere." KVM uses IPI's to for example send virtual interrupts and update the guest clock at certain conditions (for example after VM migration). So this seems a little heavy for our usecase. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>