Re: [PATCH] xfs: fix race condition in inodegc list and cpumask handling

Long Li <leo.lilong@xxxxxxxxxx> · Wed, 25 Dec 2024 20:41:18 +0800

On Tue, Dec 10, 2024 at 05:20:51PM +1100, Dave Chinner wrote:
> On Thu, Dec 05, 2024 at 02:59:16PM +0800, Long Li wrote:
> > On Tue, Nov 26, 2024 at 08:03:43AM +1100, Dave Chinner wrote:
> > 
> > Sorry for reply so late, because I want to make the problem as clear
> > as possible, but there are still some doubts.
> > 
> > > On Mon, Nov 25, 2024 at 09:52:58AM +0800, Long Li wrote:
> > > > There is a race condition between inodegc queue and inodegc worker where
> > > > the cpumask bit may not be set when concurrent operations occur.
> > > 
> > > What problems does this cause? i.e. how do we identify systems
> > > hitting this issue?
> > > 
> > 
> > I haven't encountered any actual issues, but while reviewing 62334fab4762
> > ("xfs: use per-mount cpumask to track nonempty percpu inodegc lists"), I
> > noticed there is a potential problem.
> > 
> > When the gc worker runs on a CPU other than the specified one due to
> > loadbalancing,
> 
> How? inodegc is using get_cpu() to pin the task to the cpu while it
> processes the queue and then queues the work to be run on that CPU.
> The per-PCU inodegc queue is then processed using a single CPU
> affine worker thread. The whole point of this setup is that
> scheduler load balancing, etc, cannot disturb the cpu affinity of
> the queues and the worker threads that service them.
> 
> How does load balancing break explicit CPU affine kernel task
> scheduling?

Sorry, I misunderstood earlier. The load balancing mechanisms cannot
interfere with the CPU affinity of the queues. The inodegc workqueue
is not of WQ_UNBOUND type, so work items typically execute on their
designated CPUs.

However, in CPU hotplug scenarios, the CPU offline process will unbind
workers, causing work items to execute on other CPUs. If we queue an
inode on a CPU that's about to go offline after workqueue_offline_cpu()
but before the CPU is actually marked offline, the following concurrent
sequence might occur (though I haven't been able to reproduce this scenario
in practice).