On Fri, 10 Jan 2025 at 10:23, Marco Elver <elver@xxxxxxxxxx> wrote: > > From: "Jiao, Joey" <quic_jiangenj@xxxxxxxxxxx> > > > > The current design of KCOV risks frequent buffer overflows. To mitigate > > this, new modes are introduced: KCOV_TRACE_UNIQ_PC, KCOV_TRACE_UNIQ_EDGE, > > and KCOV_TRACE_UNIQ_CMP. These modes allow for the recording of unique > > PCs, edges, and comparison operands (CMP). > > There ought to be a cover letter explaining the motivation for this, > and explaining why the new modes would help. Ultimately, what are you > using KCOV for where you encountered this problem? > > > Key changes include: > > - KCOV_TRACE_UNIQ_[PC|EDGE] can be used together to replace KCOV_TRACE_PC. > > - KCOV_TRACE_UNIQ_CMP can be used to replace KCOV_TRACE_CMP mode. > > - Introduction of hashmaps to store unique coverage data. > > - Pre-allocated entries in kcov_map_init during KCOV_INIT_TRACE to avoid > > performance issues with kmalloc. > > - New structs and functions for managing memory and unique coverage data. > > - Example program demonstrating the usage of the new modes. > > This should be a patch series, carefully splitting each change into a > separate patch. > https://docs.kernel.org/process/submitting-patches.html#split-changes > > > With the new hashmap and pre-alloced memory pool added, cover size can't > > be set to higher value like 1MB in KCOV_TRACE_PC or KCOV_TRACE_CMP modes > > in 2GB device with 8 procs, otherwise it causes frequent oom. > > > > For KCOV_TRACE_UNIQ_[PC|EDGE|CMP] modes, smaller cover size like 8KB can > > be used. > > > > Signed-off-by: Jiao, Joey <quic_jiangenj@xxxxxxxxxxx> > > As-is it's hard to review, and the motivation is unclear. A lot of > code was moved and changed, and reviewers need to understand why that > was done besides your brief explanation above. > > Generally, KCOV has very tricky constraints, due to being callable > from any context, including NMI. This means adding new dependencies > need to be carefully reviewed. For one, we can see this in genalloc's > header: > > > * The lockless operation only works if there is enough memory > > * available. If new memory is added to the pool a lock has to be > > * still taken. So any user relying on locklessness has to ensure > > * that sufficient memory is preallocated. > > * > > * The basic atomic operation of this allocator is cmpxchg on long. > > * On architectures that don't have NMI-safe cmpxchg implementation, > > * the allocator can NOT be used in NMI handler. So code uses the > > * allocator in NMI handler should depend on > > * CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG. > > And you are calling gen_pool_alloc() from __sanitizer_cov_trace_pc. > Which means this implementation is likely broken on > !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG architectures (do we have > architectures like that, that support KCOV?). > > There are probably other sharp corners due to the contexts KCOV can > run in, but would simply ask you to carefully reason about why each > new dependency is safe. I am also concerned about the performance effect. Does it add a stack frame to __sanitizer_cov_trace_pc()? Please show disassm of the function before/after. Also, I have concerns about interrupts and reentrancy. We are still getting some reentrant calls from interrupts (not all of them are filtered by in_task() check). I am afraid these complex hashmaps will corrupt.