Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 7 Jan 2025 17:18:39 -0800

[..]
> > > Couple of possibly related thoughts:
> > >
> > > 1) I have been thinking some more about the purpose of this per-cpu
> > acomp_ctx
> > >      mutex. It appears that the main benefit is it causes task blocked errors
> > (which are
> > >      useful to detect problems) if any computes in the code section it covers
> > take a
> > >      long duration. Other than that, it does not protect a resource, nor
> > prevent
> > >      cpu offlining from deleting its containing structure.
> >
> > It does protect resources. Consider this case:
> > - Process A runs on CPU #1, gets the acomp_ctx, and locks it, then is
> > migrated to CPU #2.
> > - Process B runs on CPU #1, gets the same acomp_ctx, and tries to lock
> > it then waits for process A. Without the lock they would be using the
> > same lock.
> >
> > It is also possible that process A simply gets preempted while running
> > on CPU #1 by another task that also tries to use the acomp_ctx. The
> > mutex also protects against this case.
>
> Got it, thanks for the explanations. It seems with this patch, the mutex
> would be redundant in the first example. Would this also be true of the
> second example where process A gets preempted?

I think the mutex is still required in the second example. Migration
being disabled does not prevent other processes from running on the
same CPU and attempting to use the same acomp_ctx.

> If not, is it worth
> figuring out a solution that works for both migration and preemption?

Not sure exactly what you mean here. I suspect you mean have a single
mechanism to protect against concurrent usage and CPU hotunplug rather
than disabling migration and having a mutex. Yeah that would be ideal,
but probably not for a hotfix.

>
> >
> > >
> > > 2) Seems like the overall problem appears to be applicable to any per-cpu
> > data
> > >      that is being used by a process, vis-a-vis cpu hotunplug. Could it be that a
> > >      solution in cpu hotunplug can safe-guard more generally? Really not sure
> > >      about the specifics of any solution, but it occurred to me that this may
> > not
> > >      be a problem unique to zswap.
> >
> > Not really. Static per-CPU data and data allocated with alloc_percpu()
> > should be available for all possible CPUs, regardless of whether they
> > are online or not, so CPU hotunplug is not relevant. It is relevant
> > here because we allocate the memory dynamically for online CPUs only
> > to save memory. I am not sure how important this is as I am not aware
> > what the difference between the number of online and possible CPUs can
> > be in real life deployments.
>
> Thought I would clarify what I meant: the problem of per-cpu data that
> gets allocated dynamically using cpu hotplug and deleted even while in use
> by cpu hotunplug may not be unique to zswap. If so, I was wondering if
> a more generic solution in the cpu hotunplug code would be feasible/worth
> exploring.

I didn't look too closely, if there's something out there or something
can be easily developed I'd be open to updating the zswap code
accordingly, but I don't have time to look into it tbh, and it's too
late in the release cycle to get creative imo.