On Wed, Jul 8, 2020 at 1:08 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > On Wed, Jul 08, 2020 at 09:17:37AM +0200, Dmitry Vyukov wrote: > > On Tue, Jul 7, 2020 at 8:17 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > > Kmemleak never performs well under heavy load. Normally you'd need to > > > let the system settle for a bit before checking whether the leaks are > > > still reported. The issue is caused by the memory scanning not stopping > > > the whole machine, so pointers may be hidden in registers on different > > > CPUs (list insertion/deletion for example causes transient kmemleak > > > confusion). > > > > > > I think the syzkaller guys tried a year or so ago to run it in parallel > > > with kmemleak and gave up shortly. The proposal was to add a "stopscan" > > > command to kmemleak which would do this under stop_machine(). However, > > > no-one got to implementing it. > > > > > > So, in this case, does the leak still appear with the reproducer, once > > > the system went idle? > > > > This report came from syzbot, so obviously we did not give up :) > > That's good to know ;). > > > We don't run scanning in parallel with fuzzing and do a very intricate > > multi-step dance to overcome false positives: > > https://github.com/google/syzkaller/blob/5962a2dc88f6511b77100acdf687c1088f253f6b/executor/common_linux.h#L3407-L3478 > > and only report leaks that are reproducible. > > So far I have not seen any noticable amount of false positives, and > > you can see 70 already fixed leaks here: > > https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-gce-leak > > https://syzkaller.appspot.com/upstream?manager=ci-upstream-gce-leak > > Thanks for the information and the good work here. If you have time, you > could implement the stop_machine() kmemleak scan as well ;). stop_machine will only help with pointers stored in registers/jumping in memory. But there may be other sources of false positives like hidden pointers via some hashing, offsets, reused low/high bits. Doing several scans and crc checksum of object contents helps with these as well and is orthogonal to stop_machine. So now I wonder if using stop_machine will actually solve all problems... because if not, then doing this work but then having to do several scans and checksums anyway is kinda pointless...