[..] > > > > > Apparently __swap_duplicate() does not currently handle increasing the > > > > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it > > > > > does not handle rolling back count increases when > > > > > swap_count_continued() fails. > > > > > > > > > > I guess this voids my Reviewed-by until we sort this out. Technically > > > > > swap_count_continued() won't ever be called for shmem because we only > > > > > ever increment the count by 1, but there is no way to know this in > > > > > __swap_duplicate() without SWAP_HAS_SHMEM. > > > > > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to > > > remove the swapfile check (that's another can of worms, but I need > > > data before submitting the patch to remove it...) > > > > > > One thing we can do is instead of warning here, we can handle it in > > > the for loop check, where we have access to count - that's the point > > > of having that for-loop check anyway? :) > > > > > > There's a couple of ways to go about it: > > > > > > 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 ); > > > > Hmm that should work, although it's a bit complicated tbh. > > > > > (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX)) > > > > I think this will make the warning very hard to hit if there's a > > misuse of __swap_duplicate(). It will only be hit when an entry needs > > count continuation, which I am not sure is very common. If there's a > > bug, the warning will potentially catch it too late, if ever. > > > > The side effect here is failing to decrement the swap count of some > > swap entries which will lead to them never being freed, essentially > > leaking swap capacity slowly over time. I am not sure if there are > > more detrimental effects. > > > > > > > > 2. Alternatively, instead of warning here, we can simply return > > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since > > > this MUST succeed. > > > > We still fail to rollback incremented counts though when we return > > -ENOMEM, right? Maybe I didn't get what you mean. > > My understanding now is that there are two for loops. One for loop > that checks the entry's states, and one for loop that does the actual > incrementing work (or state modification). > > We can check in the first for loop, if it is safe to proceed: > > if (!count && !has_cache) { > err = -ENOENT; > } else if (usage == SWAP_HAS_CACHE) { > if (has_cache) > err = -EEXIST; > } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) { > err = -EINVAL; > } else if (usage == 1 && nr > 1 && (count & ~COUNT_CONTINUED) >= > SWAP_MAP_MAX)) { > /* the batched variants currently do not support rollback */ > err = -ENOMEM; > } Hmm yeah I think something like this should work and is arguably better than just warning, although this needs cleaning up: - We already know usage != SWAP_HAS_CACHE, so no need to check if usage == 1. - We already know (count & ~COUNT_CONTINUED) is larger than SWAP_MAP_MAX, so we should check if it's equal to SWAP_MAP_MAX. - We should probably just calculate count & ~COUNT_CONTINUED above the if conditions at this point. I would also like to hear what Barry thinks since he added this (and I just realized he is not CC'd).