On Wed 22-11-23 02:39:15, Yosry Ahmed wrote: > On Wed, Nov 22, 2023 at 2:09 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Wed 22-11-23 09:52:42, Michal Hocko wrote: > > > On Tue 21-11-23 22:44:32, Yosry Ahmed wrote: > > > > On Tue, Nov 21, 2023 at 10:41 PM Liu Shixin <liushixin2@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > On 2023/11/21 21:00, Michal Hocko wrote: > > > > > > On Tue 21-11-23 17:06:24, Liu Shixin wrote: > > > > > > > > > > > > However, in swapcache_only mode, the scan count still increased when scan > > > > > > non-swapcache pages because there are large number of non-swapcache pages > > > > > > and rare swapcache pages in swapcache_only mode, and if the non-swapcache > > > > > > is skipped and do not count, the scan of pages in isolate_lru_folios() can > > > > > > eventually lead to hung task, just as Sachin reported [2]. > > > > > > I find this paragraph really confusing! I guess what you meant to say is > > > > > > that a real swapcache_only is problematic because it can end up not > > > > > > making any progress, correct? > > > > > This paragraph is going to explain why checking swapcache_only after scan += nr_pages; > > > > > > > > > > > > AFAIU you have addressed that problem by making swapcache_only anon LRU > > > > > > specific, right? That would be certainly more robust as you can still > > > > > > reclaim from file LRUs. I cannot say I like that because swapcache_only > > > > > > is a bit confusing and I do not think we want to grow more special > > > > > > purpose reclaim types. Would it be possible/reasonable to instead put > > > > > > swapcache pages on the file LRU instead? > > > > > It looks like a good idea, but I'm not sure if it's possible. I can try it, is there anything to > > > > > pay attention to? > > > > > > > > I think this might be more intrusive than we think. Every time a page > > > > is added to or removed from the swap cache, we will need to move it > > > > between LRUs. All pages on the anon LRU will need to go through the > > > > file LRU before being reclaimed. I think this might be too big of a > > > > change to achieve this patch's goal. > > > > > > TBH I am not really sure how complex that might turn out to be. > > > Swapcache tends to be full of subtle issues. So you might be right but > > > it would be better to know _why_ this is not possible before we end up > > > phising for couple of swapcache pages on potentially huge anon LRU to > > > isolate them. Think of TB sized machines in this context. > > > > Forgot to mention that it is not really far fetched from comparing this > > to MADV_FREE pages. Those are anonymous but we do not want to keep them > > on anon LRU because we want to age them indepdendent on the swap > > availability as they are just dropped during reclaim. Not too much > > different from swapcache pages. There are more constrains on those but > > fundamentally this is the same problem, no? > > I agree it's not a first, but swap cache pages are more complicated > because they can go back and forth, unlike MADV_FREE pages which > usually go on a one way ticket AFAICT. Yes swapcache pages are indeed more complicated but most of the time they just go away as well, no? MADV_FREE can be reinitiated if they are written as well. So fundamentally they are not that different. > Also pages going into the swap > cache can be much more common that MADV_FREE pages for a lot of > workloads. I am not sure how different reclaim heuristics will react > to such mobility between the LRUs, and the fact that all pages will > now only get evicted through the file LRU. The anon LRU will > essentially become an LRU that feeds the file LRU. Also, the more > pages we move between LRUs, the more ordering violations we introduce, > as we may put colder pages in front of hotter pages or vice versa. Well, traditionally the file LRU has been maintaining page cache or easily disposable pages like MADV_FREE (which can be considered a cache as well). Swapcache is a form of a page cache as well. > All in all, I am not saying it's a bad idea or not possible, I am just > saying it's probably more complicated than MADV_FREE, and adding more > cases where pages move between LRUs could introduce problems (or make > existing problems more visible). Do we want to start adding filtered anon scan for a certain type of pages? Because this is the question here AFAICS. This might seem an easier solution but I would argue that it is less predictable one. It is not unusual that a huge anon LRU would contain only very few LRU pages. That being said, I might be missing some obvious or less obvious reasons why this is completely bad idea. Swapcache is indeed subtle. -- Michal Hocko SUSE Labs