On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote: > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote: > > The interruption caused by vmstat_update is undesirable > > for certain aplications: > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > > > The example above shows an additional 7us for the > > > > oslat -> kworker -> oslat > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > observed in the guest is higher than 50us, violating the acceptable > > latency threshold. > > I personally find the above problem description insufficient. I have > asked several times and only got piece by piece information each time. > Maybe there is a reason to be secretive but it would be great to get at > least some basic expectations described and what they are based on. There is no reason to be secretive. > > E.g. workloads are running on isolated cpus with nohz full mode to > shield off any kernel interruption. Yet there are operations that update > counters (like mlock, but not mlock alone) that update per cpu counters > that will eventually get flushed and that will cause some interference. > Now the host/guest transition and intereference. How that happens when > the guest is running on an isolated and dedicated cpu? Follows the updated changelog. Does it contain the information requested ? ---- Performance details for the kworker interruption: With workloads that are running on isolated cpus with nohz full mode to shield off any kernel interruption. For example, a VM running a time sensitive application with a 50us maximum acceptable interruption (use case: soft PLC). oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... The example above shows an additional 7us for the oslat -> kworker -> oslat switches. In the case of a virtualized CPU, and the vmstat_update interruption in the host (of a qemu-kvm vcpu), the latency penalty observed in the guest is higher than 50us, violating the acceptable latency threshold. The isolated vCPU can perform operations that modify per-CPU page counters, for example to complete I/O operations: CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace> => 0xffffffffc042b083 => mod_zone_page_state => __folio_end_writeback => folio_end_writeback => iomap_finish_ioend => blk_mq_end_request_batch => nvme_irq => __handle_irq_event_percpu => handle_irq_event => handle_edge_irq => __common_interrupt => common_interrupt => asm_common_interrupt => vmx_do_interrupt_nmi_irqoff => vmx_handle_exit_irqoff => vcpu_enter_guest => vcpu_run => kvm_arch_vcpu_ioctl_run => kvm_vcpu_ioctl => __x64_sys_ioctl => do_syscall_64 => entry_SYSCALL_64_after_hwframe > > Skip periodic updates for nohz full CPUs. Any callers who > > need precise values should use a snapshot of the per-CPU > > counters, or use the global counters with measures to > > handle errors up to thresholds (see calculate_normal_threshold). > > I would rephrase this paragraph. > In kernel users of vmstat counters either require the precise value and > they are using zone_page_state_snapshot interface or they can live with > an imprecision as the regular flushing can happen at arbitrary time and > cumulative error can grow (see calculate_normal_threshold). > >From that POV the regular flushing can be postponed for CPUs that have > been isolated from the kernel interference withtout critical > infrastructure ever noticing. Skip regular flushing from vmstat_shepherd > for all isolated CPUs to avoid interference with the isolated workload. > > > Suggested by Michal Hocko. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> OK, updated comment, thanks.