On Fri, Jan 07, 2022 at 10:38:18AM +0100, Michal Hocko wrote: > On Tue 04-01-22 13:30:00, Yu Zhao wrote: > [...] > > Hi Andrew, Linus, > > > > Can you please take a look at this patchset and let me know if it's > > 5.17 material? > > I am still not done with the review and have seen at least few problems > that would need to be addressed. > > But more fundamentally I believe there are really some important > questions to be answered. First and foremost this is a major addition > to the memory reclaim and there should be a wider consensus that we > really want to go that way. The patchset doesn't have a single ack nor > reviewed-by AFAICS. I haven't seen a lot of discussion since v2 > (http://lkml.kernel.org/r/20210413065633.2782273-1-yuzhao@xxxxxxxxxx) > nor do I see any clarification on how concerns raised there have been > addressed or at least how they are planned to be addressed. > > Johannes has made some excellent points > http://lkml.kernel.org/r/YHcpzZYD2fQyWvEQ@xxxxxxxxxxx. Let me quote > for reference part of it I find the most important: > : Realistically, I think incremental changes are unavoidable to get this > : merged upstream. > : > : Not just in the sense that they need to be smaller changes, but also > : in the sense that they need to replace old code. It would be > : impossible to maintain both, focus development and testing resources, > : and provide a reasonably stable experience with both systems tugging > : at a complicated shared code base. > : > : On the other hand, the existing code also has billions of hours of > : production testing and tuning. We can't throw this all out overnight - > : it needs to be surgical and the broader consequences of each step need > : to be well understood. > : > : We also have millions of servers relying on being able to do upgrades > : for drivers and fixes in other subsystems that we can't put on hold > : until we stabilized a new reclaim implementation from scratch. > > Fully agreed on all points here. > > I do appreciate there is a lot of work behind this patchset and I > also do understand it has gained a considerable amount of testing as > well. Your numbers are impressive but my experience tells me that it is > equally important to understand the worst case behavior and there is not > really much mentioned about those in changelogs. > > We also shouldn't ignore costs the code is adding. One of them would be > a further page flags depletion. We have been hitting problems on that > front for years and many features had to be reworked to bypass a lack of > space in page->flags. > > I will be looking more into the code (especially the memcg side of it) > but I really believe that a consensus on above Johannes' points need to > be found first before this work can move forward. Thanks for the summary. I appreciate your time and I agree your assessment is fair. So I've acknowledged your concerns, and you've acknowledged my numbers (the performance improvements) are impressive. Now we are in agreement, cheers. Next, I argue that the benefits of this patchset outweigh its risks, because, drawing from my past experience, 1. There have been many larger and/or riskier patchsets taken; I'll assemble a list if you disagree. And this patchset is fully guarded by #ifdef; Linus has also assessed on this point. 2. There have been none that came with the testing/benchmarking coverage as this one did. Please point me to some if I'm mistaken, and I'll gladly match them. The numbers might not materialize in the real world; the code is not perfect; and many other risks... But all the top eight open source memory hogs were covered, which is unprecedented; memcached and fio showed significant improvements and it only takes a few commands to see for yourselves. Regarding the acks and the reviewed-bys, I certainly can ask people who have reaped the benefits of this patchset to do them, if it's required. But I see less fun in that. I prefer to provide empirical evidence and convince people who are on the other side of the aisle.