On Wed, Dec 18, 2019 at 10:25:45AM -0800, Ben Gardon wrote: > On Mon, Dec 2, 2019 at 6:15 PM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > +static bool direct_walk_iterator_next_pte(struct direct_walk_iterator *iter) > > > +{ > > > + /* > > > + * This iterator could be iterating over a large number of PTEs, such > > > + * that if this thread did not yield, it would cause scheduler\ > > > + * problems. To avoid this, yield if needed. Note the check on > > > + * MMU_LOCK_MAY_RESCHED in direct_walk_iterator_cond_resched. This > > > + * iterator will not yield unless that flag is set in its lock_mode. > > > + */ > > > + direct_walk_iterator_cond_resched(iter); > > > > This looks very fragile, e.g. one of the future patches even has to avoid > > problems with this code by limiting the number of PTEs it processes. > > With this, functions either need to limit the number of PTEs they > process or pass the MMU_LOCK_MAY_RESCHED to the iterator. It would > probably be safer to invert the flag and make it > MMU_LOCK_MAY_NOT_RESCHED for functions that can self-regulate the > number of PTEs they process or have weird synchronization > requirements. For example, the page fault handler can't reschedule and > we know it won't process many entries, so we could pass > MMU_LOCK_MAY_NOT_RESCHED in there. That doesn't address the underlying fragility of the iterator, i.e. relying on callers to self-regulate. Especially since the threshold is completely arbitrary, e.g. in zap_direct_gfn_range(), what's to say PDPE and lower is always safe, e.g. if should_resched() becomes true at the very start of the walk? The direct comparison to zap_direct_gfn_range() is slot_handle_level_range(), which supports rescheduling regardless of what function is being invoked. What prevents the TDP iterator from doing the same? E.g. what's the worst case scenario if a reschedule pops up at an inopportune time?