On Wed, Apr 19, 2023 at 08:14:09AM -0300, Marcelo Tosatti wrote: > On Tue, Apr 18, 2023 at 03:02:00PM -0700, Andrew Morton wrote: > > On Mon, 20 Mar 2023 15:03:32 -0300 Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > > > > This patch series addresses the following two problems: > > > > > > 1. A customer provided evidence indicating that a process > > > was stalled in direct reclaim: > > > > > > ... > > > > > > 2. With a task that busy loops on a given CPU, > > > the kworker interruption to execute vmstat_update > > > is undesired and may exceed latency thresholds > > > for certain applications. > > > > > > > I don't think I'll be sending this upstream in the next merge window. > > Because it isn't clear that the added complexity in vmstat handling is > > justified. > > From my POV this is an incorrect statement (that the complexity in > vmstat handling is not justified). > > Andrew, this is the 3rd attempt to fix this problem: > > First try: https://lore.kernel.org/lkml/20220127173037.318440631@fedora.localdomain/ > > Second try: https://patchew.org/linux/20230105125218.031928326@xxxxxxxxxx/ > > Third try: syncing vmstats remotely from vmstat_shepherd (this > patchset). > > And also, can you please explain: what is so complicated about the > vmstat handling? cmpxchg has been around and is used all over the > kernel, and nobody considers "excessively complicated". > > > - Michal's request for more clarity on the end-user requirements > > seems reasonable. > > And i explained to Michal in great detail where the end-user > requirements come from. For virtualized workloads, there are two > types of use-cases: > > 1) For example, for the MAC scheduler processing must occur every 1ms, > and a certain amount of computation takes place (and must finish before > the next 1ms timeframe). A > 50us latency spike as observed by cyclictest > is considered a "failure". > > I showed him a 7us trace caused by, and explained that will extend to > > 50us in the case of virtualized vCPU. > > 2) PLCs. These workloads will also suffer > 50us latency spikes > which is undesirable. > > Can you please explain what additional clarity is required? > > RH's performance team, for example, has been performing packet > latency tests and waiting for this issue to be fixed for about 2 > years now. > > Andrew Theurer, can you please explain what problem is the vmstat_work > interruption causing in your testing? +CC Andrew.