On Tue, Mar 17, 2015 at 12:06 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > TO close the loop here, now I'm back home and can run tests: > > config 3.19 4.0-rc1 4.0-rc4 > defaults 8m08s 9m34s 9m14s > -o ag_stride=-1 4m04s 4m38s 4m11s > -o bhash=101073 6m04s 17m43s 7m35s > -o ag_stride=-1,bhash=101073 4m54s 9m58s 7m50s > > It's better but there are still significant regressions, especially > for the large memory footprint cases. I haven't had a chance to look > at any stats or profiles yet, so I don't know yet whether this is > still page fault related or some other problem.... Ok. I'd love to see some data on what changed between 3.19 and rc4 in the profiles, just to see whether it's "more page faults due to extra COW", or whether it's due to "more TLB flushes because of the pte_write() vs pte_dirty()" differences. I'm *guessing* lot of the remaining issues are due to extra page fault overhead because I'd expect write/dirty to be fairly 1:1, but there could be differences due to shared memory use and/or just writebacks of dirty pages that become clean. I guess you can also see in vmstat.mm_migrate_pages whether it's because of excessive migration (because of bad grouping) or not. So not just profiles data. At the same time, I feel fairly happy about the situation - we at least understand what is going on, and the "3x worse performance" case is at least gone. Even if that last case still looks horrible. So it's still a bad performance regression, but at the same time I think your test setup (big 500 TB filesystem, but then a fake-numa thing with just 4GB per node) is specialized and unrealistic enough that I don't feel it's all that relevant from a *real-world* standpoint, and so I wouldn't be uncomfortable saying "ok, the page table handling cleanup caused some issues, but we know about them and how to fix them longer-term". So I don't consider this a 4.0 showstopper or a "we need to revert for now" issue. If it's a case of "we take a lot more page faults because we handle the NUMA fault and then have a COW fault almost immediately", then the fix is likely to do the same early-cow that the normal non-numa-fault case does. In fact, my gut feel is that we should try to unify that numa/regula fault handling path a bit more, but that would be a pretty invasive patch. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>