On Mon, Oct 05, 2020 at 07:28:38AM -0700, Dave Hansen wrote: > On 10/5/20 7:11 AM, Jarkko Sakkinen wrote: > > + unsigned long count = 0; > ... > > + xas_lock(&xas); > > + xas_for_each(&xas, page, idx_end) { > > + if (++count % XA_CHECK_SCHED) > > + continue; > > Let's slow down and think through the loop, please. > > First time through the loop, count=0, XA_CHECK_SCHED=4096, it will do a > ++count, and now count=1. (count % XA_CHECK_SCHED) == 1. It will > continue. It skips the page->vm_max_prot_bits checks. > > Next time through the loop, count=1, XA_CHECK_SCHED=4096, it will do a > ++count, and now count=2. (count % XA_CHECK_SCHED) == 2. It will > continue. It skips the page->vm_max_prot_bits checks. > > ... > > It will do this until it hits count=4095 where it will actually fall > into the rest of the loop, doing the page->vm_max_prot_bits checks. Uh oh, how stupid of me. > So, in the end the loop only does what it's supposed to be doing 1/4096 > times. Not great. Don't we have tests that will notice breakage like this? It would not be hard to have two counts and WARN_ON in the end to compare for mismatch. > The XA_CHECK_SCHED needs to be stuck near the *end* of the loop, just > before the lock dropping and resched stuff. Referring to Matthew's suggestion: https://lore.kernel.org/linux-sgx/20201005111139.GK20115@xxxxxxxxxxxxxxxxxxxx/ I think that the order is just right. It takes the page, does the processing, and in the tail does check before rescheduling. The only thing I'd do for clarity would be to change the tail in that like this: /* Move to the next iteration. */ count++; /* Surpassed the modulus, reschedule. */ if (!(count % XA_CHECK_SCHED)) { xas_pause(&xas); xas_unlock(&xas); cond_resched(); xas_lock(&xas); } } xas_unlock(&xas); I think this is just a bit more readable way to express the same thing. Matthew, can you agree with this small twist? /Jarkko