Re: cluster_alloc_swap_entry() may change per-cpu cluster of another CPU randomly

Kairui Song <ryncsn@xxxxxxxxx> · Thu, 24 Oct 2024 11:54:36 +0800

On Thu, Oct 24, 2024 at 11:50 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Thu, Oct 24, 2024 at 11:29 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
> >
> > Hi, Kairui,
> >
> > When reading new swap cluster allocation code.  Sorry, I didn't find
> > time to review your patch at the first place.  I found that in
> > cluster_alloc_swap_entry(), the following code path is possible,
> >
> >   cluster = this_cpu_ptr(si->percpu_cluster);
> >   offset = alloc_swap_scan_cluster();
> >     ...
> >       spin_unlock(&ci->lock);
> >       spin_unlock(&si->lock);
> >       /* migrate to another cpu */
> >       spin_lock(&ci->lock);
> >       spin_lock(&si->lock);
> >   cluster->next[order] = offset;
> >
> > That is, the per cpu cluster of a CPU may be changed on another CPU.  I
> > guess that this will not cause some functionality issue.  However, this
> > makes code harder to be reasoned.  Is it possible to avoid unlock before
> > changing per cpu cluster?  Or, give up updating per cpu cluster if we
> > need to unlock.
>
> Hi Ying
>
> Yes, I've noticed this when working on the new si->lock series. It may
> cause fragmentation, only if the CPU the current allocation migrates
> from requested another swap allocation, in such case it may waste one
> per cpu cluster.
>
> No functionality or leak issue, but make things harder to reason
> indeed, as you mentioned.
>
> In the new series I used a local lock to prevent migration, so the cpu
> is stable during the whole process. However that is not doable unless
> si->lock is removed from allocation path or there are lock dependency
> issue.
>
> Do we need to fix this separately in the current tree? The
> fragmentation is overall better with these patches, so I think that is

Typo: I mean I think these patches are not causing new issues.

> causing any issues. Before these patches per CPU cluster can also be
> stolen by other CPUs so these were not guaranteed to be accurate.
>
> One easy way we can do is set the cluster->next[order] to 0 before
> scanning for a new cluster to use, and check if it is 0 before setting
> next[order] again.
>
> That may still cause fragmentation as migration will cause extra
> clusters to be scanned and used, which is hard to avoid without
> changing too much code (like the things being done in the new si->lock
> series).