[cc:Martin] Den 08.05.2019 19.07, skrev Nicolas Saenz Julienne: > Small follow-up on this, and CCing Noralf as I forgot to add him on the last > e-mail. > > On Wed, 2019-05-08 at 17:01 +0200, Nicolas Saenz Julienne wrote: >> Hi, This has been seen on a Raspberry Pi 3B+, using a w5500 Ethernet SPI >> interface on the main SPI controller. The commit that introduced the issue >> seems to be: >> >> c9ba7a16d0f1d2b1e70d47296eca5f612c753825 "spi: Release spi_res after >> finalizing message" >> >> Here's the kernel log: >> >> [ 170.085704] Unable to handle kernel paging request at virtual address >> 0000000000100000 >> [ 170.097773] Mem abort info: >> [ 170.101873] ESR = 0x86000004 >> [ 170.105222] Exception class = IABT (current EL), IL = 32 bits >> [ 170.111599] SET = 0, FnV = 0 >> [ 170.115040] EA = 0, S1PTW = 0 >> [ 170.118484] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000ed70a017 >> [ 170.125511] [0000000000100000] pgd=0000000000000000 >> [ 170.130872] Internal error: Oops: 86000004 [#1] PREEMPT SMP >> [ 170.136532] Modules linked in: fuse vc4 cec drm_kms_helper cfbfillrect >> brcmfmac cfbimgblt panel_raspberrypi_touchscreen cfbcopyarea drm hci_uart >> btbcm brcmutil bluetooth cfg80211 w5100_spi bcm2835_v4l2(C) w5100 v4l2_common >> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev >> microchip lan78xx crct10dif_ce drm_panel_orientation_quirks i2c_gpio >> ecdh_generic edt_ft5x06 raspberrypi_hwmon ecc media rfkill snd_bcm2835(C) >> input_polldev bcm2835_thermal spi_bcm2835 bcm2835_dma pwm_bcm2835 bcm2835_rng >> rng_core virt_dma ip_tables x_tables ipv6 >> [ 170.186506] Process spi0 (pid: 1335, stack limit = 0x00000000fb9981f5) >> [ 170.193133] CPU: 0 PID: 1335 Comm: spi0 Tainted: G WC 5.1.0- >> nico+ #116 >> [ 170.200985] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) >> [ 170.207343] pstate: 60000005 (nZCv daif -PAN -UAO) >> [ 170.212205] pc : 0x100000 >> [ 170.214864] lr : spi_res_release.part.9+0x7c/0xb0 >> [ 170.219631] sp : ffff000011303ce0 >> [ 170.222987] x29: ffff000011303ce0 x28: ffff00001122bc30 >> [ 170.228373] x27: 00000000cccccccd x26: ffff000010ee96c8 >> [ 170.233759] x25: dead000000000100 x24: dead000000000200 >> [ 170.239143] x23: ffff000010d28b78 x22: ffff80002fa60e80 >> [ 170.244528] x21: ffff80002567b000 x20: ffff00001122bc88 >> [ 170.249914] x19: ffff00001122bc30 x18: 0000000000000020 >> [ 170.255297] x17: 0000000000000000 x16: 0000000000000000 >> [ 170.260682] x15: ffffffffffffffff x14: ffff000010ee96c8 >> [ 170.266068] x13: ffff000010fd3388 x12: ffff000010fd2fb7 >> [ 170.271453] x11: 0000000000000000 x10: ffff000011303ce0 >> [ 170.276840] x9 : ffff000011303ce0 x8 : 000000000000000f >> [ 170.282226] x7 : 0000000000000005 x6 : ffff000010fd264d >> [ 170.287612] x5 : 0000000000000000 x4 : 0000000000000000 >> [ 170.292998] x3 : 0000000000100000 x2 : ffff80002fa60e98 >> [ 170.298384] x1 : ffff00001122bc30 x0 : ffff80002567b000 >> [ 170.303769] Call trace: >> [ 170.306247] 0x100000 >> [ 170.308550] spi_transfer_one_message+0x268/0x388 >> [ 170.313321] __spi_pump_messages+0x200/0x508 >> [ 170.317651] spi_pump_messages+0x14/0x20 >> [ 170.321630] kthread_worker_fn+0xa8/0x188 >> [ 170.325694] kthread+0x120/0x128 >> [ 170.328966] ret_from_fork+0x10/0x18 >> [ 170.332595] Code: bad PC value >> [ 170.335694] ---[ end trace d054c016d94fa755 ]--- > > It seems the SPI controller thread is racing with the device's thread. > Something like this: > > SPI Controller Thread SPI Device Thread > > -> spi_sync_transfer() creates > spi_message on stack then > sleeps until finished > > [SPI transfer happens...] > > -> spi_finalize_current_message() > which wakes up SPI Device Thread > > -> spi_sync_transfer() returns, the > message disapears from the stack > > -> spi_res_release() there is no more > spi_message and the memory is > potentialy used for something else > > I've been looking at the spi_split_transfers_maxsize() code and can't think of > a reason why spi_res_release() couldn't be placed before > spi_finalize_current_message(). Which would solve the issue, but I guess Noralf > has a better perspective on the topic. > The problem was that spi_res_release() restored the original transfers before spi_unmap_msg() is called in spi_finalize_current_message() thus dma unmapping the original transfers, not the split ones that was mapped. This is the accompanying change to the driver that hasn't been applied: [v5,3/4] spi/spi-bcm2835: Split transfers that exceed DLEN https://patchwork.kernel.org/patch/10899587/ Unless Martin Sperl, who wrote spi_split_transfers_maxsize(), has other suggestions, I think we should just revert this patch. Apparently I need to go back to the drawing board with this. Noralf. > Regards, > Nicolas >