Debugging kernel performance bugs is usually a hard task. In this talk, I would like to share my recent experience of using BPF to debug a performance regression caused by kernel lock contentions. I am going to introduce the BPF programs that I used to successfully root cause a regression of kernel operation latencies that only occurs in our production environment. Traditional profiling and monitoring tools fail to capture relevant information for me to debug this issue, but BPF tracing programs provide a sufficient set of facilities that turn out to be very helpful. They are made possible by the modern BPF features, including: - Lock contention tracepoints [1] - BPF timer [2] - CO-RE kernel type matching APIs [3] - BPF static linking [4] Looking forward, as we will see more heterogeneous platforms, profiling kernel synchronization performances and gaining better understanding are going to become more important nowadays. From my experience, I would like to open discussions for support of better observability of kernel performances, including getting full lock owner information [5] and more reliably getting Build ID stacktraces [6]. [1] https://lore.kernel.org/bpf/20220322185709.141236-1-namhyung@xxxxxxxxxx/ [2] https://lwn.net/Articles/862136/ [3] https://lwn.net/Articles/898839/ [4] https://lwn.net/Articles/848997/ [5] https://lore.kernel.org/lkml/20230207002403.63590-1-namhyung@xxxxxxxxxx/ [6] https://lore.kernel.org/bpf/Y9vX49CtDzyg3B%2F8@krava/T/