On Wed, Jun 17, 2020 at 02:26:19PM +0900, js1304@xxxxxxxxx wrote: > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > In current implementation, newly created or swap-in anonymous page > is started on active list. Growing active list results in rebalancing > active/inactive list so old pages on active list are demoted to inactive > list. Hence, the page on active list isn't protected at all. > > Following is an example of this situation. > > Assume that 50 hot pages on active list. Numbers denote the number of > pages on active/inactive list (active | inactive). > > 1. 50 hot pages on active list > 50(h) | 0 > > 2. workload: 50 newly created (used-once) pages > 50(uo) | 50(h) > > 3. workload: another 50 newly created (used-once) pages > 50(uo) | 50(uo), swap-out 50(h) > > This patch tries to fix this issue. > Like as file LRU, newly created or swap-in anonymous pages will be > inserted to the inactive list. They are promoted to active list if > enough reference happens. This simple modification changes the above > example as following. > > 1. 50 hot pages on active list > 50(h) | 0 > > 2. workload: 50 newly created (used-once) pages > 50(h) | 50(uo) > > 3. workload: another 50 newly created (used-once) pages > 50(h) | 50(uo), swap-out 50(uo) > > As you can see, hot pages on active list would be protected. > > Note that, this implementation has a drawback that the page cannot > be promoted and will be swapped-out if re-access interval is greater than > the size of inactive list but less than the size of total(active+inactive). > To solve this potential issue, following patch will apply workingset > detection that is applied to file LRU some day before. > > v6: Before this patch, all anon pages (inactive + active) are considered > as workingset. However, with this patch, only active pages are considered > as workingset. So, file refault formula which uses the number of all > anon pages is changed to use only the number of active anon pages. I can see that also from the code, but it doesn't explain why. And I'm not sure this is correct. I can see two problems with it. After your patch series, there is still one difference between anon and file: cache trim mode. If the "use-once" anon dominate most of memory and you have a small set of heavily thrashing files, it would not get recognized. File refaults *have* to compare their distance to the *entire* anon set, or we could get trapped in cache trimming mode even as file pages with access frequencies <= RAM are thrashing. On the anon side, there is no cache trimming mode. But even if we're not in cache trimming mode and active file is already being reclaimed, we have to recognize thrashing on the anon side when reuse frequencies are within available RAM. Otherwise we treat an inactive file that is not being reused as having the same value as an anon page that is being reused. And then we may reclaim file and anon at the same rate even as anon is thrashing and file is not. That's not right. We need to activate everything with a reuse frequency <= RAM. Reuse frequency is refault distance plus size of the inactive list the page was on. This means anon distances should be compared to active anon + inactive file + active file, and file distances should be compared to active file + inactive_anon + active anon. workingset_size should basically always be everything except the inactive list the page is refaulting from as that represents the delta between total RAM and the amount of space this page had available.