Hi Catalin, I have sent the v4 of this series [1] and combine the two function with a single loop. See codes for details. [1] https://lore.kernel.org/linux-arm-kernel/20200601144713.2222-1-yezhenyu2@xxxxxxxxxx/ On 2020/5/21 1:08, Catalin Marinas wrote: >> This optimization is only effective when the range is a multiple of 256KB >> (when the page size is 4KB), and I'm worried about the performance >> of ilog2(). I traced the __flush_tlb_range() last year and found that in >> most cases the range is less than 256K (see details in [1]). > > THP or hugetlbfs would exercise bigger strides but I guess it depends on > the use-case. ilog2() should be reduced to a few instructions on arm64 > AFAICT (haven't tried but it should use the CLZ instruction). > Not bigger than 256K, but the range must be a integer multiple of 256KB, so I still start from scale 0. Thanks, Zhenyu