On Wed, 2019-05-08 at 19:33 +0200, Noralf Trønnes wrote: > [cc:Martin] > > Den 08.05.2019 19.07, skrev Nicolas Saenz Julienne: > > Small follow-up on this, and CCing Noralf as I forgot to add him on the last > > e-mail. > > > > On Wed, 2019-05-08 at 17:01 +0200, Nicolas Saenz Julienne wrote: > > > Hi, This has been seen on a Raspberry Pi 3B+, using a w5500 Ethernet SPI > > > interface on the main SPI controller. The commit that introduced the issue > > > seems to be: > > > > > > c9ba7a16d0f1d2b1e70d47296eca5f612c753825 "spi: Release spi_res after > > > finalizing message" > > > > > > Here's the kernel log: > > > > > > [ 170.085704] Unable to handle kernel paging request at virtual address > > > 0000000000100000 > > > [ 170.097773] Mem abort info: > > > [ 170.101873] ESR = 0x86000004 > > > [ 170.105222] Exception class = IABT (current EL), IL = 32 bits > > > [ 170.111599] SET = 0, FnV = 0 > > > [ 170.115040] EA = 0, S1PTW = 0 > > > [ 170.118484] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000ed70a017 > > > [ 170.125511] [0000000000100000] pgd=0000000000000000 > > > [ 170.130872] Internal error: Oops: 86000004 [#1] PREEMPT SMP > > > [ 170.136532] Modules linked in: fuse vc4 cec drm_kms_helper cfbfillrect > > > brcmfmac cfbimgblt panel_raspberrypi_touchscreen cfbcopyarea drm hci_uart > > > btbcm brcmutil bluetooth cfg80211 w5100_spi bcm2835_v4l2(C) w5100 > > > v4l2_common > > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common > > > videodev > > > microchip lan78xx crct10dif_ce drm_panel_orientation_quirks i2c_gpio > > > ecdh_generic edt_ft5x06 raspberrypi_hwmon ecc media rfkill snd_bcm2835(C) > > > input_polldev bcm2835_thermal spi_bcm2835 bcm2835_dma pwm_bcm2835 > > > bcm2835_rng > > > rng_core virt_dma ip_tables x_tables ipv6 > > > [ 170.186506] Process spi0 (pid: 1335, stack limit = 0x00000000fb9981f5) > > > [ 170.193133] CPU: 0 PID: 1335 Comm: spi0 Tainted: > > > G WC 5.1.0- > > > nico+ #116 > > > [ 170.200985] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > > > [ 170.207343] pstate: 60000005 (nZCv daif -PAN -UAO) > > > [ 170.212205] pc : 0x100000 > > > [ 170.214864] lr : spi_res_release.part.9+0x7c/0xb0 > > > [ 170.219631] sp : ffff000011303ce0 > > > [ 170.222987] x29: ffff000011303ce0 x28: ffff00001122bc30 > > > [ 170.228373] x27: 00000000cccccccd x26: ffff000010ee96c8 > > > [ 170.233759] x25: dead000000000100 x24: dead000000000200 > > > [ 170.239143] x23: ffff000010d28b78 x22: ffff80002fa60e80 > > > [ 170.244528] x21: ffff80002567b000 x20: ffff00001122bc88 > > > [ 170.249914] x19: ffff00001122bc30 x18: 0000000000000020 > > > [ 170.255297] x17: 0000000000000000 x16: 0000000000000000 > > > [ 170.260682] x15: ffffffffffffffff x14: ffff000010ee96c8 > > > [ 170.266068] x13: ffff000010fd3388 x12: ffff000010fd2fb7 > > > [ 170.271453] x11: 0000000000000000 x10: ffff000011303ce0 > > > [ 170.276840] x9 : ffff000011303ce0 x8 : 000000000000000f > > > [ 170.282226] x7 : 0000000000000005 x6 : ffff000010fd264d > > > [ 170.287612] x5 : 0000000000000000 x4 : 0000000000000000 > > > [ 170.292998] x3 : 0000000000100000 x2 : ffff80002fa60e98 > > > [ 170.298384] x1 : ffff00001122bc30 x0 : ffff80002567b000 > > > [ 170.303769] Call trace: > > > [ 170.306247] 0x100000 > > > [ 170.308550] spi_transfer_one_message+0x268/0x388 > > > [ 170.313321] __spi_pump_messages+0x200/0x508 > > > [ 170.317651] spi_pump_messages+0x14/0x20 > > > [ 170.321630] kthread_worker_fn+0xa8/0x188 > > > [ 170.325694] kthread+0x120/0x128 > > > [ 170.328966] ret_from_fork+0x10/0x18 > > > [ 170.332595] Code: bad PC value > > > [ 170.335694] ---[ end trace d054c016d94fa755 ]--- > > > > It seems the SPI controller thread is racing with the device's thread. > > Something like this: > > > > SPI Controller Thread SPI Device Thread > > > > -> spi_sync_transfer() creates > > spi_message on stack then > > sleeps until finished > > > > [SPI transfer happens...] > > > > -> spi_finalize_current_message() > > which wakes up SPI Device Thread > > > > -> spi_sync_transfer() returns, > > the > > message disapears from the stack > > > > -> spi_res_release() there is no more > > spi_message and the memory is > > potentialy used for something else > > > > I've been looking at the spi_split_transfers_maxsize() code and can't think > > of > > a reason why spi_res_release() couldn't be placed before > > spi_finalize_current_message(). Which would solve the issue, but I guess > > Noralf > > has a better perspective on the topic. > > > > The problem was that spi_res_release() restored the original transfers > before spi_unmap_msg() is called in spi_finalize_current_message() thus > dma unmapping the original transfers, not the split ones that was mapped. Ok, now I get it. > This is the accompanying change to the driver that hasn't been applied: > [v5,3/4] spi/spi-bcm2835: Split transfers that exceed DLEN > https://patchwork.kernel.org/patch/10899587/ It's applied :). I'm actually preparing a patch on top of that so it only splits the messages if DMA is really available. > Unless Martin Sperl, who wrote spi_split_transfers_maxsize(), has other > suggestions, I think we should just revert this patch. > > Apparently I need to go back to the drawing board with this. > > Noralf. Regards, Nicolas
Attachment:
signature.asc
Description: This is a digitally signed message part