On Mon, Nov 4, 2024 at 11:06 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Kairui Song <ryncsn@xxxxxxxxx> writes: > > I think that the logic of SWP_SCANNING is correct. > > As for SWP_WRITEOK, we should avoid to allocate swap entries when other > task clears it ASAP. So, the above checking may be too late, because we > have already allocated the swap entry. I think that we can check it in > cluster_reclaim_range() and refuse to allocate swap entry there. Thanks for the feedback. Not sure if this is a big issue. With the above patch SWP_WRITEOK will prevents any new allocations from using the device, already ongoing allocations may still use this device indeed, but in the end swapoff will still succeed, maybe very slightly slower as swapoff may have to unuse a few more entries. I can add another SWP_WRITEOK check to make it better, will post a patch. > Combined with the per cpu cluster (cluster->next[]) changing issue we > discussed in another email of the thread. I think the long term > solution could be separate the swap entry allocation and reclaiming a > bit more. For example, we could try to allocate some swap entries with > si->lock held firstly, record possible reclaim offset if any, change > cluster->next[], then unlock, reclaim, lock. > > -- > Best Regards, > Huang, Ying