On Mon, 2018-03-19 at 19:04 +0100, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. > > ------------------ > > From: Stephane Eranian <eranian@xxxxxxxxxx> > > > [ Upstream commit 88b897a30c525c2eee6e7f16e1e8d0f18830845e ] > > This patch significantly improves the execution time of > perf_event__synthesize_mmap_events() when running perf record on systems > where processes have lots of threads. > > It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to > generate each map line in the maps file. If you have 1000 threads, then you > have necessarily 1000 stacks. For each vma, you need to check if it > corresponds to a thread's stack. With a large number of threads, this can take > a very long time. I have seen latencies >> 10mn. > > As of today, perf does not use the fact that a mapping is a stack, therefore we > can work around the issue by using /proc/pid/tasks/pid/maps. This entry does > not try to map a vma to stack and is thus much faster with no loss of > functonality. > > The proc-map-timeout logic is kept in case users still want some upper limit. > > In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual > /proc/pid/task/pid/maps, tasks -> task. Thanks Arnaldo for catching this. > > Committer note: > > This problem seems to have been elliminated in the kernel since commit : > b18cb64ead40 ("fs/proc: Stop trying to report thread stacks"). [...] I don't think so. It looks like this was fixed by commit 65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation") which we already have in 4.4-stable. But older branches (3.16, 3.18, 4.1) don't have that and probably should do. It looks like commit b18cb64ead40 ("fs/proc: Stop trying to report thread stacks") is also a candidate for stable. Ben. -- Ben Hutchings Software Developer, Codethink Ltd.