Any chance you could redo the reports with --no-children --call-graph=fractal added? The mode that includes child overheads unfortunately makes the output hard to interpet/compare.
Of course. Not sure if that is important but I upgraded perf for that (because--no-childrenoption was introduced in ~3.16), so perf record and perf report were done with different perf versions.
<pg94_perf_report.txt.gz>
<pg95_perf_report.txt.gz>
<pg96_perf_report.txt.gz>
Also I’ve done the same test on same host (RHEL 6) but with 4.6 kernel/perf and writing perf data to /dev/shm for not loosing events. Perf report output is also attached but important thing is that the regression is not so significant:
Andres, is there any chance that you would find time to look at those results? Are they actually useful?
The results from pg9?_perf_report.txt are attached. Note that in all cases some events were lost, i.e.:
root@pgload05g ~ # perf report -g -i pg94_all.data >/tmp/pg94_perf_report.txt Failed to open [vsyscall], continuing without symbols Warning: Processed 537137 events and lost 7846 chunks!
You can reduce the overhead by reducing the sampling frequency, e.g. by specifying -F 300.