> > > 2. There have been none that came with the testing/benchmarking > > > coverage as this one did. Please point me to some if I'm mistaken, > > > and I'll gladly match them. > > > > I do appreciate your numbers but you should realize that this is an area > > that is really hard to get any conclusive testing for. > > Fully agreed. That's why we started a new initiative, and we hope more > people will following these practices: > 1. All results in this area should be reported with at least standard > deviations, or preferably confidence intervals. > 2. Real applications should be benchmarked (with synthetic load > generator), not just synthetic benchmarks (not real applications). > 3. A wide range of devices should be covered, i.e., servers, desktops, > laptops and phones. > > I'm very confident to say our benchmark reports were hold to the > highest standards. We have worked with MariaDB (company), EnterpriseDB > (Postgres), Redis (company), etc. on these reports. They have copies > of these reports (PDF version): > https://linux-mm.googlesource.com/benchmarks/ > > We welcome any expert in those applications to examine our reports, > and we'll be happy to run any other benchmarks or same benchmarks with > different configurations that anybody thinks it's important and we've > missed. I really think this gets at the heart of the issue with mm development, and is one of the reasons it's been extra frustrating to not have an MM conf for the past couple of years; I think sorting out how we measure & proceed on changes would be easier done f2f. E.g. concluding with a consensus that if something doesn't regress on X, Y, and Z, and has reasonably maintainable and readable code, we should merge it and try it out. But since f2f isn't an option until 2052 at the earliest... I understand the desire for an "incremental approach that gets us from A->B". In the abstract it sounds great. However, with a change like this one, I think it's highly likely that such a path would be littered with regressions both large and small, and would probably be more difficult to reason about than the relatively clean design of MGLRU. On top of that, I don't think we'll get the kind of user feedback we need for something like this *without* merging it. Yu has done a tremendous job collecting data here (and the results are really incredible), but I think we can all agree that without extensive testing in the field with all sorts of weird codes, we're not going to find the problematic behaviors we're concerned about. So unless we want to eschew big mm changes entirely (we shouldn't! look at net or scheduling for how important big rewrites are to progress), I think we should be open to experimenting with new stuff. We can always revert if things get too unwieldy. None of this is to say that there may not be lots more comments on the code or potential fixes/changes to incorporate before merging; I'm mainly arguing about the mindset we should have to changes like this, not all the stuff the community is already really good at (i.e. testing and reviewing code on a nuts & bolts level). Thanks, Jesse