On 2024-08-16 16:29:11, Sean Christopherson wrote: > On Mon, Aug 12, 2024, Vipin Sharma wrote: > > + list_for_each_entry(sp, &kvm->arch.possible_nx_huge_pages, possible_nx_huge_page_link) { > > + if (i++ >= max) > > + break; > > + if (is_tdp_mmu_page(sp) == tdp_mmu) > > + return sp; > > + } > > This is silly and wasteful. E.g. in the (unlikely) case there's one TDP MMU > page amongst hundreds/thousands of shadow MMU pages, this will walk the list > until @max, and then move on to the shadow MMU. > > Why not just use separate lists? Before this patch, NX huge page recovery calculates "to_zap" and then it zaps first "to_zap" pages from the common list. This series is trying to maintain that invarient. If we use two separate lists then we have to decide how many pages should be zapped from TDP MMU and shadow MMU list. Few options I can think of: 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately. Effectively, this might double the work for recovery thread. 2. Try zapping "to_zap" page from one list and if there are not enough pages to zap then zap from the other list. This can cause starvation. 3. Do half of "to_zap" from one list and another half from the other list. This can lead to situations where only half work is being done by the recovery worker thread. Option (1) above seems more reasonable to me.