On Thu, Oct 24, 2024 at 11:29 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Hi, Kairui, > > When reading new swap cluster allocation code. Sorry, I didn't find > time to review your patch at the first place. I found that in > cluster_alloc_swap_entry(), the following code path is possible, > > cluster = this_cpu_ptr(si->percpu_cluster); > offset = alloc_swap_scan_cluster(); > ... > spin_unlock(&ci->lock); > spin_unlock(&si->lock); > /* migrate to another cpu */ > spin_lock(&ci->lock); > spin_lock(&si->lock); > cluster->next[order] = offset; > > That is, the per cpu cluster of a CPU may be changed on another CPU. I > guess that this will not cause some functionality issue. However, this > makes code harder to be reasoned. Is it possible to avoid unlock before > changing per cpu cluster? Or, give up updating per cpu cluster if we > need to unlock. Hi Ying Yes, I've noticed this when working on the new si->lock series. It may cause fragmentation, only if the CPU the current allocation migrates from requested another swap allocation, in such case it may waste one per cpu cluster. No functionality or leak issue, but make things harder to reason indeed, as you mentioned. In the new series I used a local lock to prevent migration, so the cpu is stable during the whole process. However that is not doable unless si->lock is removed from allocation path or there are lock dependency issue. Do we need to fix this separately in the current tree? The fragmentation is overall better with these patches, so I think that is causing any issues. Before these patches per CPU cluster can also be stolen by other CPUs so these were not guaranteed to be accurate. One easy way we can do is set the cluster->next[order] to 0 before scanning for a new cluster to use, and check if it is 0 before setting next[order] again. That may still cause fragmentation as migration will cause extra clusters to be scanned and used, which is hard to avoid without changing too much code (like the things being done in the new si->lock series).