Hi, CCed: Christoph and Robin, as the issue is partially dma-mapping related. On 27.09.2022 13:21, Vincent Whitchurch wrote: > The SPI core DMA mapping support performs cache management once for the > entire message and not between transfers, and this leads to cache > corruption if a message has two or more RX transfers with both > transfers targeting the same cache line, and the controller driver > decides to handle one using DMA and the other using PIO (for example, > because one is much larger than the other). > > Fix it by syncing before/after the actual transfers. This also means > that we can skip the sync during the map/unmap of the message. > > Fixes: 99adef310f68 ("spi: Provide core support for DMA mapping transfers") > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> > --- This patch landed in linux next-20220929 as commit 0c17ba73c08f ("spi: Fix cache corruption due to DMA/PIO overlap"). Unfortunately it causes kernel oops on one of my test systems: 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 0000000c [0000000c] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: cmac bnep btsdio hci_uart btbcm s5p_mfc btintel brcmfmac bluetooth videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev cfg80211 mc ecdh_generic ecc brcmutil CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 6.0.0-rc7-next-20220929-dirty #12903 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events ax88796c_work PC is at dma_direct_sync_sg_for_device+0x24/0xb8 LR is at spi_transfer_one_message+0x4c4/0xabc pc : [<c01cbcf0>] lr : [<c0739fcc>] psr: 20000013 ... Process kworker/0:1 (pid: 12, stack limit = 0xca429928) Stack: (0xe0071d38 to 0xe0072000) ... dma_direct_sync_sg_for_device from spi_transfer_one_message+0x4c4/0xabc spi_transfer_one_message from __spi_pump_transfer_message+0x300/0x770 __spi_pump_transfer_message from __spi_sync+0x304/0x3f4 __spi_sync from spi_sync+0x28/0x40 spi_sync from axspi_read_rxq+0x98/0xc8 axspi_read_rxq from ax88796c_work+0x7a8/0xf6c ax88796c_work from process_one_work+0x288/0x774 process_one_work from worker_thread+0x44/0x504 worker_thread from kthread+0xf0/0x124 kthread from ret_from_fork+0x14/0x2c Exception stack(0xe0071fb0 to 0xe0071ff8) ... ---[ end trace 0000000000000000 ]--- This happens because sg_free_table() doesn't clear table->orig_nents nor table->nents. If the given spi xfer object is reused without dma-mapped buffer, then a NULL pointer de-reference happens at table->sgl spi_dma_sync_for_device()/spi_dma_sync_for_cpu(). A possible fix would be to zero table->orig_nents in spi_unmap_buf_attrs(). I will send a patch for this soon. However, I think that clearing table->orig_nents and table->nents should be added to __sg_free_table() in lib/scatterlist.c to avoid this kind of issue in the future. This however will be a significant change that might break code somewhere, if it relies on the nents/orig_nents value after calling sg_free_table(). Christoph, Robin - what is your opinion? > drivers/spi/spi.c | 109 +++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 88 insertions(+), 21 deletions(-) > > ... Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland