Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM

Barry Song <21cnbao@xxxxxxxxx> · Wed, 25 Sep 2024 14:26:52 +0800

On Wed, Sep 25, 2024 at 2:12 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> [..]
> > > > > > Apparently __swap_duplicate() does not currently handle increasing the
> > > > > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it
> > > > > > does not handle rolling back count increases when
> > > > > > swap_count_continued() fails.
> > > > > >
> > > > > > I guess this voids my Reviewed-by until we sort this out. Technically
> > > > > > swap_count_continued() won't ever be called for shmem because we only
> > > > > > ever increment the count by 1, but there is no way to know this in
> > > > > > __swap_duplicate() without SWAP_HAS_SHMEM.
> > > >
> > > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to
> > > > remove the swapfile check (that's another can of worms, but I need
> > > > data before submitting the patch to remove it...)
> > > >
> > > > One thing we can do is instead of warning here, we can handle it in
> > > > the for loop check, where we have access to count - that's the point
> > > > of having that for-loop check anyway? :)
> > > >
> > > > There's a couple of ways to go about it:
> > > >
> > > > 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 );
> > >
> > > Hmm that should work, although it's a bit complicated tbh.
> > >
> > > > (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX))
> > >
> > > I think this will make the warning very hard to hit if there's a
> > > misuse of __swap_duplicate(). It will only be hit when an entry needs
> > > count continuation, which I am not sure is very common. If there's a
> > > bug, the warning will potentially catch it too late, if ever.
> > >
> > > The side effect here is failing to decrement the swap count of some
> > > swap entries which will lead to them never being freed, essentially
> > > leaking swap capacity slowly over time. I am not sure if there are
> > > more detrimental effects.
> > >
> > > >
> > > > 2. Alternatively, instead of warning here, we can simply return
> > > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since
> > > > this MUST succeed.
> > >
> > > We still fail to rollback incremented counts though when we return
> > > -ENOMEM, right? Maybe I didn't get what you mean.
> >
> > My understanding now is that there are two for loops. One for loop
> > that checks the entry's states, and one for loop that does the actual
> > incrementing work (or state modification).
> >
> > We can check in the first for loop, if it is safe to proceed:
> >
> > if (!count && !has_cache) {
> >     err = -ENOENT;
> > } else if (usage == SWAP_HAS_CACHE) {
> > if (has_cache)
> >     err = -EEXIST;
> > } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
> >     err = -EINVAL;
> > } else if (usage == 1 && nr > 1 && (count & ~COUNT_CONTINUED) >=
> > SWAP_MAP_MAX)) {
> >     /* the batched variants currently do not support rollback */
> >     err = -ENOMEM;
> > }
>
> Hmm yeah I think something like this should work and is arguably
> better than just warning, although this needs cleaning up:
> - We already know usage != SWAP_HAS_CACHE, so no need to check if usage == 1.
> - We already know (count & ~COUNT_CONTINUED) is larger than
> SWAP_MAP_MAX, so we should check if it's equal to SWAP_MAP_MAX.
> - We should probably just calculate count & ~COUNT_CONTINUED above the
> if conditions at this point.
>
> I would also like to hear what Barry thinks since he added this (and I
> just realized he is not CC'd).

I am perfectly fine with the approach, in the first loop, if we find all entries
don't need CONTINUED, we can run the 2nd loop even for usage==1
and nr > 1. this is almost always true for a real product where anon folios
are unlikely to be fork-shared by so many processes.

but we need fall back to iterating nr times if this really happens:

int swap_duplicate_nr(swp_entry_t entry, int nr)
{
   ....
   if (nr > 1 and ENOMEM) {
   for(nr entries) {
    __swap_duplicate(entry, 1, 1);
    entry = next_entry;
  }
}

seems a bit ugly?

maybe we can keep the swap_duplicate(swp_entry_t entry)
there? then avoid __swap_duplicate(entry, 1, 1);?

Thanks
Barry