On Wed, Sep 25, 2024 at 2:12 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > [..] > > > > > > Apparently __swap_duplicate() does not currently handle increasing the > > > > > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it > > > > > > does not handle rolling back count increases when > > > > > > swap_count_continued() fails. > > > > > > > > > > > > I guess this voids my Reviewed-by until we sort this out. Technically > > > > > > swap_count_continued() won't ever be called for shmem because we only > > > > > > ever increment the count by 1, but there is no way to know this in > > > > > > __swap_duplicate() without SWAP_HAS_SHMEM. > > > > > > > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to > > > > remove the swapfile check (that's another can of worms, but I need > > > > data before submitting the patch to remove it...) > > > > > > > > One thing we can do is instead of warning here, we can handle it in > > > > the for loop check, where we have access to count - that's the point > > > > of having that for-loop check anyway? :) > > > > > > > > There's a couple of ways to go about it: > > > > > > > > 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 ); > > > > > > Hmm that should work, although it's a bit complicated tbh. > > > > > > > (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX)) > > > > > > I think this will make the warning very hard to hit if there's a > > > misuse of __swap_duplicate(). It will only be hit when an entry needs > > > count continuation, which I am not sure is very common. If there's a > > > bug, the warning will potentially catch it too late, if ever. > > > > > > The side effect here is failing to decrement the swap count of some > > > swap entries which will lead to them never being freed, essentially > > > leaking swap capacity slowly over time. I am not sure if there are > > > more detrimental effects. > > > > > > > > > > > 2. Alternatively, instead of warning here, we can simply return > > > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since > > > > this MUST succeed. > > > > > > We still fail to rollback incremented counts though when we return > > > -ENOMEM, right? Maybe I didn't get what you mean. > > > > My understanding now is that there are two for loops. One for loop > > that checks the entry's states, and one for loop that does the actual > > incrementing work (or state modification). > > > > We can check in the first for loop, if it is safe to proceed: > > > > if (!count && !has_cache) { > > err = -ENOENT; > > } else if (usage == SWAP_HAS_CACHE) { > > if (has_cache) > > err = -EEXIST; > > } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) { > > err = -EINVAL; > > } else if (usage == 1 && nr > 1 && (count & ~COUNT_CONTINUED) >= > > SWAP_MAP_MAX)) { > > /* the batched variants currently do not support rollback */ > > err = -ENOMEM; > > } > > Hmm yeah I think something like this should work and is arguably > better than just warning, although this needs cleaning up: > - We already know usage != SWAP_HAS_CACHE, so no need to check if usage == 1. > - We already know (count & ~COUNT_CONTINUED) is larger than > SWAP_MAP_MAX, so we should check if it's equal to SWAP_MAP_MAX. > - We should probably just calculate count & ~COUNT_CONTINUED above the > if conditions at this point. > > I would also like to hear what Barry thinks since he added this (and I > just realized he is not CC'd). I am perfectly fine with the approach, in the first loop, if we find all entries don't need CONTINUED, we can run the 2nd loop even for usage==1 and nr > 1. this is almost always true for a real product where anon folios are unlikely to be fork-shared by so many processes. but we need fall back to iterating nr times if this really happens: int swap_duplicate_nr(swp_entry_t entry, int nr) { .... if (nr > 1 and ENOMEM) { for(nr entries) { __swap_duplicate(entry, 1, 1); entry = next_entry; } } seems a bit ugly? maybe we can keep the swap_duplicate(swp_entry_t entry) there? then avoid __swap_duplicate(entry, 1, 1);? Thanks Barry