Can you run this in a KVM ? My go-to is virtme-ng, where I can run my hacks on my laptop, in its own VM - on a copy of my whole system. with the tools I'm familiar with. then you can attach gdb to the VM. then Id try a watchpoint on the memory. On Fri, Nov 15, 2024 at 11:19 AM Muni Sekhar <munisekharrms@xxxxxxxxx> wrote: > > Hi all, > > I am encountering a memory corruption issue in the function > msm_set_laddr() from the Slimbus MSM Controller driver source code. > https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-sunfish-4.14-android12/drivers/slimbus/slim-msm-ctrl.c > > In msm_set_laddr(), one of the arguments is ea (enumeration address), > which is a pointer to constant data. While testing, I observed strange > behavior: > > The contents of the ea buffer get corrupted during a timeout scenario > in the call to: > > timeout = wait_for_completion_timeout(&done, HZ); > > Specifically, the ea buffer's contents differ before and after the > wait_for_completion_timeout() call, even though it's declared as a > pointer to constant data (const u8 *ea). > To debug this issue, I enabled KASAN, but it didn't reveal any memory > corruption. After the buffer corruption, random memory allocations in > other parts of the kernel occasionally result in a GPF crash. > > Here is the relevant part of the code: > > static int msm_set_laddr(struct slim_controller *ctrl, const u8 *ea, > u8 elen, u8 laddr) > { > struct msm_slim_ctrl *dev = slim_get_ctrldata(ctrl); > struct completion done; > int timeout, ret, retries = 0; > u32 *buf; > retry_laddr: > init_completion(&done); > mutex_lock(&dev->tx_lock); > buf = msm_get_msg_buf(dev, 9, &done); > if (buf == NULL) > return -ENOMEM; > buf[0] = SLIM_MSG_ASM_FIRST_WORD(9, SLIM_MSG_MT_CORE, > SLIM_MSG_MC_ASSIGN_LOGICAL_ADDRESS, > SLIM_MSG_DEST_LOGICALADDR, > ea[5] | ea[4] << 8); > buf[1] = ea[3] | (ea[2] << 8) | (ea[1] << 16) | (ea[0] << 24); > buf[2] = laddr; > ret = msm_send_msg_buf(dev, buf, 9, MGR_TX_MSG); > timeout = wait_for_completion_timeout(&done, HZ); > if (!timeout) > dev->err = -ETIMEDOUT; > if (dev->err) { > ret = dev->err; > dev->err = 0; > } > mutex_unlock(&dev->tx_lock); > if (ret) { > pr_err("set LADDR:0x%x failed:ret:%d, retrying", laddr, ret); > if (retries < INIT_MX_RETRIES) { > msm_slim_wait_retry(dev); > retries++; > goto retry_laddr; > } else { > pr_err("set LADDR failed after retrying:ret:%d", ret); > } > } > return ret; > } > > What I've Tried: > KASAN: Enabled it but couldn't identify the source of the corruption. > Debugging Logs: Added logs to print the ea contents before and after > the wait_for_completion_timeout() call. The logs show a mismatch in > the data. > > Question: > How can I efficiently trace the source of the memory corruption in > this scenario? > Could wait_for_completion_timeout() or a related function cause > unintended side effects? > Are there additional tools or techniques (e.g., dynamic debugging or > specific kernel config options) that can help identify this > corruption? > Any insights or suggestions would be greatly appreciated! > > > > -- > Thanks, > Sekhar > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies@xxxxxxxxxxxxxxxxx > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies