On Mon, Feb 29, 2016 at 11:22 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Mon, Feb 29, 2016 at 12:47 AM, Catalin Marinas > <catalin.marinas@xxxxxxx> wrote: >> On Sun, Feb 28, 2016 at 05:42:24PM +0100, Dmitry Vyukov wrote: >>> On Mon, Feb 15, 2016 at 11:42 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: >>> > When I am running the following program in a parallel loop, kmemleak >>> > starts reporting memory leaks of objects allocated in >>> > tty_register_driver during boot. These leaks start popping up >>> > chaotically and as you can see they originate in different drivers >>> > (synclinkmp_init, isdn_init, chr_dev_init, sysfs_init). >>> > >>> > On commit 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 (4.5-rc3). >> [...] >>> > unreferenced object 0xffff88006708dc20 (size 8): >>> > comm "swapper/0", pid 1, jiffies 4294672590 (age 930.839s) >>> > hex dump (first 8 bytes): >>> > 74 74 79 53 4c 4d 38 00 ttySLM8. >>> > backtrace: >>> > [<ffffffff81765d10>] __kmalloc_track_caller+0x1b0/0x320 mm/slub.c:4068 >>> > [<ffffffff816b37a9>] kstrdup+0x39/0x70 mm/util.c:53 >>> > [<ffffffff816b3826>] kstrdup_const+0x46/0x60 mm/util.c:74 >>> > [<ffffffff8194e5bb>] __kernfs_new_node+0x2b/0x2b0 fs/kernfs/dir.c:536 >>> > [<ffffffff81951c70>] kernfs_new_node+0x80/0xe0 fs/kernfs/dir.c:572 >>> > [<ffffffff81957223>] kernfs_create_link+0x33/0x150 fs/kernfs/symlink.c:32 >>> > [<ffffffff81959c4b>] sysfs_do_create_link_sd.isra.2+0x8b/0x120 >> [...] >>> +Catalin (kmemleak maintainer) >>> >>> I am noticed a weird thing. I am not 100% sure but it seems that the >>> leaks are reported iff I run leak checking concurrently with the >>> programs running. And if I run the program several thousand times and >>> then run leak checking, then no leaks reported. >>> >>> Catalin, it is possible that it is a kmemleak false positive? >> >> Yes, it's possible. If you run kmemleak scanning continuously (or at >> very short intervals) and especially in parallel with some intensive >> tasks, it will miss pointers that may be stored in registers (on other >> CPUs) or moved between task stacks, other memory locations. Linked lists >> are especially prone to such false positives. >> >> Kmemleak tries to work around this by checksumming each object, so it >> will only be reported if it hasn't changed on two consecutive scans. >> Since the default scanning is 10min, it is very unlikely to trigger >> false positives in such scenarios. However, if you reduce the scanning >> time (or trigger it manually in a loop), you can hit this condition. >> >>> I see that kmemleak just scans thread stacks one-by-one. I would >>> expect that kmemleak should stop all threads, then scan all stacks and >>> all registers of all threads, and then restart threads. If it does not >>> scan registers or does not stop threads, then I think it should be >>> possible that a pointer value can sneak off kmemleak. Does it make >>> sense? >> >> Given how long it takes to scan the memory, stopping the threads is not >> really feasible. You could do something like stop_machine() only for >> scanning the current stack on all CPUs but it still wouldn't catch >> pointers being moved around in memory unless you stop the system >> completely for a full scan. The heuristic about periodic scanning and >> checksumming seems to work fine in normal usage scenarios. >> >> For your tests, I would recommend that you run the tests for a long(ish) >> time and only do two kmemleak scans at the end after they finished (and >> with a few seconds delay between them). Continuous scanning is less >> reliable. > > > Thanks for the explanation, Catalin! > > Let me describe my usage scenario first. I am running automatic > testing 24x7. Currently a VM executes a dozen of small programs (a > dozen of syscalls each), then I run manual leak scanning. I can't run > significantly more programs between scans, because then I won't be > able to restore reproducers for bugs and they will be unactionable. I > could run leak checking after each program, but it will increase > overhead significantly. So a dozen of programs is a trade-off. And I > disable automatic scanning. > > False positives are super unpleasant in automatic testing. If a tool > false positive rate if high, I just disable it, it is unusable. It is > not that bad for leak checking. But each false positive consumes human > (my) time. > > So I need to run scanning twice, because the first one never reports leaks. > > For the false positives due to registers/pointer jumping, will it help > if I run scanning one more time if leaks are detected? I mean: run > scanning twice, if leaks are found sleep for several seconds and run > scanning third time. Since leaks are usually not detected I can afford > to sleep more and do one or two additional scans. The question here: > will kmemleak _remove_ an object for leaked objects, if it discovered > reachable or contents change on subsequent scans? > > Regarding stopping all threads and doing proper scan, why is not it > feasible? Will kernel break if we stall all CPUs for seconds? In > automatic testing scenarios a stalled for several seconds machine is > not a problem. But on the other hand, absence of false positives is a > must. And it would improve testing bandwidth, because we don't need > sleep and second scan. Paul, regarding this particular leak, let's consider it kmemleak false positive (until proven otherwise). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>