On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote: > Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> writes: > > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote: > >> Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> writes: > >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote: > >> >> /* > >> >> * Increase reference count of swap entry by 1. > >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required > >> >> - * but could not be atomically allocated. Returns 0, just as if it succeeded, > >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which > >> >> - * might occur if a page table entry has got corrupted. > >> >> + * > >> >> + * Return error code in following case. > >> >> + * - success -> 0 > >> >> + * - swap_count_continuation is required but could not be atomically allocated. > >> >> + * *entry is used to return swap entry to call add_swap_count_continuation(). > >> >> + * -> ENOMEM > >> >> + * - otherwise same as __swap_duplicate() > >> >> */ > >> >> -int swap_duplicate(swp_entry_t entry) > >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size) > >> >> { > >> >> int err = 0; > >> >> > >> >> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) > >> >> - err = add_swap_count_continuation(entry, GFP_ATOMIC); > >> >> + while (!err && > >> >> + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) > >> >> + err = add_swap_count_continuation(*entry, GFP_ATOMIC); > >> >> return err; > >> > > >> > Now we're returning any error we get from __swap_duplicate, apparently to > >> > accommodate ENOTDIR later in the series, which is a change from the behavior > >> > introduced in 570a335b8e22 ("swap_info: swap count continuations"). This might > >> > belong in a separate patch given its potential for side effects. > >> > >> I have checked all the calls of the function and found there will be no > >> bad effect. Do you have any side effect? > > > > Before I was just being vaguely concerned about any unintended side effects, > > but looking again, yes I do. > > > > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns > > a (potentially nonzero) entry.val, which copy_pte_range interprets > > unconditionally as 'try adding a swap count continuation.' Not what we want > > for returns other than -ENOMEM. > > Thanks for pointing this out! Before the change in the patchset, the > behavior is, > > Something wrong is detected in swap_duplicate(), but the error is > ignored. Then copy_one_pte() will think everything is OK, so that it > can proceed to call set_pte_at(). The system will be in inconsistent > state and some data may be polluted! Yes, the part about page table corruption in the comment above swap_duplicate. > But this doesn't cause any problem in practical. Per my understanding, > because if other part of the kernel works correctly, it's impossible for > swap_duplicate() return any error except -ENOMEM before the change in > this patchset. I agree with that, but it's not what I'm trying to explain. I didn't go into enough detail, let me try again. Hopefully I'm understanding this right. While running with these patches, say we're at copy_pte_range copy_one_pte swap_duplicate __swap_duplicate __swap_duplicate_locked And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as -EEXIST. That means __swap_duplicate and swap_duplicate also return -EEXIST. copy_one_pte returns entry.val, which can be and usually is nonzero, so we break out of the loop in copy_pte_range and then--erroneously--call add_swap_count_continuation. The add_swap_count_continuation call was added in 570a335b8e22 and relies on the assumption that callers can only get -ENOMEM from swap_duplicate. This patch changes that assumption. Not a big deal: the continuation call just returns early, no harm done, but it allocs and frees a page needlessly, so we should fix it. One way is to change copy_one_pte's return to int so we can just pass the error code back to copy_pte_range so it knows whether to try adding the continuation. The other swap_duplicate caller, try_to_unmap_one, seems ok. > But the error may be possible during development, and it > may serve as some kind of document too. So I suggest to add > > VM_BUG_ON(error != -ENOMEM); > > in swap_duplicate(). What do you think about that? That doesn't seem necessary. > > So it might make sense to have a separate patch that changes swap_duplicate's > > return and makes callers handle it. > > Thanks for your help to take a deep look at this. I want to try to fix > all potential problems firstly, because the number of the caller is > quite limited. Do you agree? Yes, makes sense to me. Daniel