On Tue, Mar 01, 2016 at 04:27:28PM +0100, Dmitry Vyukov wrote: > On Mon, Feb 29, 2016 at 12:34 PM, Catalin Marinas > <catalin.marinas@xxxxxxx> wrote: > > On Mon, Feb 29, 2016 at 11:22:58AM +0100, Dmitry Vyukov wrote: > >> Regarding stopping all threads and doing proper scan, why is not it > >> feasible? Will kernel break if we stall all CPUs for seconds? In > >> automatic testing scenarios a stalled for several seconds machine is > >> not a problem. But on the other hand, absence of false positives is a > >> must. And it would improve testing bandwidth, because we don't need > >> sleep and second scan. > > > > Scanning time is the main issue with it taking minutes on some slow ARM > > machines (my primary testing target). Such timing was significantly > > improved with commit 93ada579b0ee ("mm: kmemleak: optimise kmemleak_lock > > acquiring during kmemleak_scan") but even if it is few seconds, it is > > not suitable for a live, interactive system. > > > > What we could do though, since you already trigger the scanning > > manually, is to add a "stopscan" command that you echo into > > /sys/kernel/debug/kmemleak and performs a stop_machine() during memory > > scanning. If you have time, please feel free to give it a try ;). > > Stopscan would be useful for me, but I don't feel like I am ready to > tackle it. It's not that hard ;). Anyway, when I get a bit of time I'll try to look into it. > To be absolutely sure that we don't miss pointers we would also need > to scan all registers from stopped CPUs, and I don't know how to > obtain that. With stop_machine(), we probably wouldn't need to. This mechanism causes the other CPUs to go take an IPI and execute a certain function (or wait for the completion of a function call on another CPU). We can assume that the functions stop_machine() is calling wouldn't manipulate allocated objects/lists/etc., so the register file content is not relevant to kmemleak. The previous context interrupted by the IPI would be stored on the IRQ stack and that's one area the kmemleak does not scan (it's architecture specific). On the CPU issuing the stop_machine(), this would be done as a result of debugfs write and I don't think we have any object allocation/manipulation on this path (and it's only the callee-saved registers that we would miss when calling kmemleak's scan_object()). So yes, in addition to stop_machine(), we would have to scan the IRQ stack if the architecture uses a separate one (vs just the current thread stack). -- Catalin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>