On Sun, Mar 1, 2015 at 5:04 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > Across the board the 4.0-rc1 numbers are much slower, and the > degradation is far worse when using the large memory footprint > configs. Perf points straight at the cause - this is from 4.0-rc1 > on the "-o bhash=101073" config: > > - 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys > - 99.99% physflat_send_IPI_mask > - 99.37% native_send_call_func_ipi .. > > And the same profile output from 3.19 shows: > > - 9.61% 9.61% [kernel] [k] default_send_IPI_mask_sequence_phys > - 99.98% physflat_send_IPI_mask > - 96.26% native_send_call_func_ipi ... > > So either there's been a massive increase in the number of IPIs > being sent, or the cost per IPI have greatly increased. Either way, > the result is a pretty significant performance degradatation. And on Mon, Mar 2, 2015 at 11:17 AM, Matt <jackdachef@xxxxxxxxx> wrote: > > Linus already posted a fix to the problem, however I can't seem to > find the matching commit in his tree (searching for "TLC regression" > or "TLB cache"). That was commit f045bbb9fa1b, which was then refined by commit 721c21c17ab9, because it turned out that ARM64 had a very subtle relationship with tlb->end and fullmm. But both of those hit 3.19, so none of this should affect 4.0-rc1. There's something else going on. I assume it's the mm queue from Andrew, so adding him to the cc. There are changes to the page migration etc, which could explain it. There are also a fair amount of APIC changes in 4.0-rc1, so I guess it really could be just that the IPI sending itself has gotten much slower. Adding Ingo for that, although I don't think default_send_IPI_mask_sequence_phys() itself hasn't actually changed, only other things around the apic. So I'd be inclined to blame the mm changes. Obviously bisection would find it.. Linus _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs