Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 24 Sep 2024 11:11:45 -0700

[..]
> > > > > Apparently __swap_duplicate() does not currently handle increasing the
> > > > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it
> > > > > does not handle rolling back count increases when
> > > > > swap_count_continued() fails.
> > > > >
> > > > > I guess this voids my Reviewed-by until we sort this out. Technically
> > > > > swap_count_continued() won't ever be called for shmem because we only
> > > > > ever increment the count by 1, but there is no way to know this in
> > > > > __swap_duplicate() without SWAP_HAS_SHMEM.
> > >
> > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to
> > > remove the swapfile check (that's another can of worms, but I need
> > > data before submitting the patch to remove it...)
> > >
> > > One thing we can do is instead of warning here, we can handle it in
> > > the for loop check, where we have access to count - that's the point
> > > of having that for-loop check anyway? :)
> > >
> > > There's a couple of ways to go about it:
> > >
> > > 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 );
> >
> > Hmm that should work, although it's a bit complicated tbh.
> >
> > > (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX))
> >
> > I think this will make the warning very hard to hit if there's a
> > misuse of __swap_duplicate(). It will only be hit when an entry needs
> > count continuation, which I am not sure is very common. If there's a
> > bug, the warning will potentially catch it too late, if ever.
> >
> > The side effect here is failing to decrement the swap count of some
> > swap entries which will lead to them never being freed, essentially
> > leaking swap capacity slowly over time. I am not sure if there are
> > more detrimental effects.
> >
> > >
> > > 2. Alternatively, instead of warning here, we can simply return
> > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since
> > > this MUST succeed.
> >
> > We still fail to rollback incremented counts though when we return
> > -ENOMEM, right? Maybe I didn't get what you mean.
>
> My understanding now is that there are two for loops. One for loop
> that checks the entry's states, and one for loop that does the actual
> incrementing work (or state modification).
>
> We can check in the first for loop, if it is safe to proceed:
>
> if (!count && !has_cache) {
>     err = -ENOENT;
> } else if (usage == SWAP_HAS_CACHE) {
> if (has_cache)
>     err = -EEXIST;
> } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
>     err = -EINVAL;
> } else if (usage == 1 && nr > 1 && (count & ~COUNT_CONTINUED) >=
> SWAP_MAP_MAX)) {
>     /* the batched variants currently do not support rollback */
>     err = -ENOMEM;
> }

Hmm yeah I think something like this should work and is arguably
better than just warning, although this needs cleaning up:
- We already know usage != SWAP_HAS_CACHE, so no need to check if usage == 1.
- We already know (count & ~COUNT_CONTINUED) is larger than
SWAP_MAP_MAX, so we should check if it's equal to SWAP_MAP_MAX.
- We should probably just calculate count & ~COUNT_CONTINUED above the
if conditions at this point.

I would also like to hear what Barry thinks since he added this (and I
just realized he is not CC'd).