On 2024-08-19 11:31:22, Sean Christopherson wrote: > On Mon, Aug 19, 2024, David Matlack wrote: > > On Mon, Aug 19, 2024 at 10:20 AM Vipin Sharma <vipinsh@xxxxxxxxxx> wrote: > > > > > > On 2024-08-16 16:29:11, Sean Christopherson wrote: > > > > Why not just use separate lists? > > > > > > Before this patch, NX huge page recovery calculates "to_zap" and then it > > > zaps first "to_zap" pages from the common list. This series is trying to > > > maintain that invarient. > > I wouldn't try to maintain any specific behavior in the existing code, AFAIK it's > 100% arbitrary and wasn't written with any meaningful sophistication. E.g. FIFO > is little more than blindly zapping pages and hoping for the best. > > > > If we use two separate lists then we have to decide how many pages > > > should be zapped from TDP MMU and shadow MMU list. Few options I can > > > think of: > > > > > > 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately. > > > Effectively, this might double the work for recovery thread. > > > 2. Try zapping "to_zap" page from one list and if there are not enough > > > pages to zap then zap from the other list. This can cause starvation. > > > 3. Do half of "to_zap" from one list and another half from the other > > > list. This can lead to situations where only half work is being done > > > by the recovery worker thread. > > > > > > Option (1) above seems more reasonable to me. > > > > I vote each should zap 1/nx_huge_pages_recovery_ratio of their > > respective list. i.e. Calculate to_zap separately for each list. > > Yeah, I don't have a better idea since this is effectively a quick and dirty > solution to reduce guest jitter. We can at least add a counter so that the zap > is proportional to the number of pages on each list, e.g. this, and then do the > necessary math in the recovery paths. > Okay, I will work on v2 which creates two separate lists for NX huge pages. Use specific counter for TDP MMU and zap based on that.