On Fri, Apr 19, 2024 at 3:17 AM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote: > > > On 18/04/2024 23.00, Yosry Ahmed wrote: > > On Thu, Apr 18, 2024 at 4:00 AM Jesper Dangaard Brouer<hawk@xxxxxxxxxx> wrote: > >> On 18/04/2024 04.21, Yosry Ahmed wrote: > >>> On Tue, Apr 16, 2024 at 10:51 AM Jesper Dangaard Brouer<hawk@xxxxxxxxxx> wrote: > >>>> This patch aims to reduce userspace-triggered pressure on the global > >>>> cgroup_rstat_lock by introducing a mechanism to limit how often reading > >>>> stat files causes cgroup rstat flushing. > >>>> > [...] > > > Taking a step back, I think this series is trying to address two > > issues in one go: interrupt handling latency and lock contention. > > Yes, patch 2 and 3 are essentially independent and address two different > aspects. > > > While both are related because reducing flushing reduces irq > > disablement, I think it would be better if we can fix that issue > > separately with a more fundamental solution (e.g. using a mutex or > > dropping the lock at each CPU boundary). > > > > After that, we can more clearly evaluate the lock contention problem > > with data purely about flushing latency, without taking into > > consideration the irq handling problem. > > > > Does this make sense to you? > > Yes, make sense. > > So, you are suggesting we start with the mutex change? (patch 2) > (which still needs some adjustments/tuning) Yes. Let's focus on (what I assume to be) the more important problem, IRQ serving latency. Once this is fixed, let's consider the tradeoffs for improving lock contention separately. Thanks! > > This make sense to me, as there are likely many solutions to reducing > the pressure on the lock. > > With the tracepoint patch in-place, I/we can measure the pressure on the > lock, and I plan to do this across our CF fleet. Then we can slowly > work on improving lock contention and evaluate this on our fleets. > > --Jesper > p.s. > Setting expectations: > - Going on vacation today, so will resume work after 29th April.