On Tue, Jan 14, 2025 at 2:00 PM Joey Jiao <quic_jiangenj@xxxxxxxxxxx> wrote: > > On Tue, Jan 14, 2025 at 11:43:08AM +0100, Marco Elver wrote: > > On Tue, 14 Jan 2025 at 06:35, Jiao, Joey <quic_jiangenj@xxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > This patch series introduces new kcov unique modes: > > > `KCOV_TRACE_UNIQ_[PC|EDGE|CMP]`, which are used to collect unique PC, EDGE, > > > CMP information. > > > > > > Background > > > ---------- > > > > > > In the current kcov implementation, when `__sanitizer_cov_trace_pc` is hit, > > > the instruction pointer (IP) is stored sequentially in an area. Userspace > > > programs then read this area to record covered PCs and calculate covered > > > edges. However, recent syzkaller runs show that many syscalls likely have > > > `pos > t->kcov_size`, leading to kcov overflow. To address this issue, we > > > introduce new kcov unique modes. Hi Joey, Sorry for not responding earlier, I thought I'd come with a working proposal, but it is taking a while. You are right that kcov is prone to overflows, and we might be missing interesting coverage because of that. Recently we've been discussing the applicability of -fsanitize-coverage=trace-pc-guard to this problem, and it is almost working already. The idea is as follows: - -fsanitize-coverage=trace-pc-guard instruments basic blocks with calls to `__sanitizer_cov_trace_pc_guard(u32 *guard)`, each taking a unique 32-bit global in the __sancov_guards section; - these globals are zero-initialized, but upon the first call to __sanitizer_cov_trace_pc_guard() from each callsite, the corresponding global will receive a unique consequent number; - now we have a mapping of PCs into indices, which can we use to deduplicate the coverage: -- storing PCs by their index taken from *guard directly in the user-supplied buffer (which size will not exceed several megabytes in practice); -- using a per-task bitmap (at most hundreds of kilobytes) to mark visited basic blocks, and appending newly encountered PCs to the user-supplied buffer like it's done now. I think this approach is more promising than using hashmaps in kcov: - direct mapping should be way faster than a hashmap (and the overhead of index allocation is amortized, because they are persistent between program runs); - there cannot be collisions; - no additional complexity from pool allocations, RCU synchronization. The above approach will naturally break edge coverage, as there will be no notion of a program trace anymore. But it is still a question whether edges are helping the fuzzer, and correctly deduplicating them may not be worth the effort. If you don't object, I would like to finish prototyping coverage guards for kcov before proceeding with this review. Alex > > > 2. [P 2-3] Introduce `KCOV_TRACE_UNIQ_EDGE` Mode: > > > - Save `prev_pc` to calculate edges with the current IP. > > > - Add unique edges to the hashmap. > > > - Use a lower 12-bit mask to make hash independent of module offsets. Note that on ARM64 this will be effectively using bits 11:2, so if I am understanding correctly more than a million coverage callbacks will be mapped into one of 1024 buckets.