On Tue, Sep 17, 2024 at 11:36 AM Lance Yang <ioworker0@xxxxxxxxx> wrote: > > On Mon, Sep 16, 2024 at 9:25 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Fri, Sep 13, 2024 at 02:49:02PM +0530, Dev Jain wrote: > > > We use pte_range_none() to determine whether contiguous PTEs are empty > > > for an mTHP allocation. Instead of iterating the while loop for every > > > order, use some information, which is the first set PTE found, from the > > > previous iteration, to eliminate some cases. The key to understanding > > > the correctness of the patch is that the ranges we want to examine > > > form a strictly decreasing sequence of nested intervals. > > > > This is a lot more complicated. Do you have any numbers that indicate > > that it's faster? Yes, it's fewer memory references, but you've gone > > from a simple linear scan that's easy to prefetch to an exponential scan > > that might confuse the prefetchers. > > +1 > > I'm not sure if multiple mthp sizes will be enabled for common cases ;) > If not, this could be a bit more complicated, IMO. > > @Barry, could you share whether OPPO typically uses multiple mthp sizes > in their scenarios? not at all. I actually doubt we really wanted to enable multiple sizes at the same time. but somehow, i like the idea to make pte_range_none() non-bool(to be consistent with all other places we are checking pte/swap_xxx_batch() != nr_pages). Personally, I don't care about the overhead of doing PTE scanning too much. so i'd like having patch1 converting to non-bool, for patch2 which might/might not reduce overhead, i don't care too much. > > Thanks, > Lance Thanks Barry