On 9/15/22 20:01, Nadav Amit wrote: > > >> On Sep 14, 2022, at 11:42 PM, Barry Song <21cnbao@xxxxxxxxx> wrote: >> >>> >>> The very idea behind TLB deferral is the opportunity it (might) provide >>> to accumulate address ranges and cpu masks so that individual TLB flush >>> can be replaced with a more cost effective range based TLB flush. Hence >>> I guess unless address range or cpumask based cost effective TLB flush >>> is available, deferral does not improve the unmap performance as much. >> >> >> After sending tlbi, if we wait for the completion of tlbi, we have to get Ack >> from all cpus in the system, tlbi is not scalable. The point here is that we >> avoid waiting for each individual TLBi. Alternatively, they are batched. If >> you read the benchmark in the commit log, you can find the great decline >> in the cost to swap out a page. > > Just a minor correction: arch_tlbbatch_flush() does not collect ranges. > On x86 it only accumulate CPU mask. Thanks Nadav for the clarification.