Small follow-up on this, and CCing Noralf as I forgot to add him on the last e-mail. On Wed, 2019-05-08 at 17:01 +0200, Nicolas Saenz Julienne wrote: > Hi, This has been seen on a Raspberry Pi 3B+, using a w5500 Ethernet SPI > interface on the main SPI controller. The commit that introduced the issue > seems to be: > > c9ba7a16d0f1d2b1e70d47296eca5f612c753825 "spi: Release spi_res after > finalizing message" > > Here's the kernel log: > > [ 170.085704] Unable to handle kernel paging request at virtual address > 0000000000100000 > [ 170.097773] Mem abort info: > [ 170.101873] ESR = 0x86000004 > [ 170.105222] Exception class = IABT (current EL), IL = 32 bits > [ 170.111599] SET = 0, FnV = 0 > [ 170.115040] EA = 0, S1PTW = 0 > [ 170.118484] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000ed70a017 > [ 170.125511] [0000000000100000] pgd=0000000000000000 > [ 170.130872] Internal error: Oops: 86000004 [#1] PREEMPT SMP > [ 170.136532] Modules linked in: fuse vc4 cec drm_kms_helper cfbfillrect > brcmfmac cfbimgblt panel_raspberrypi_touchscreen cfbcopyarea drm hci_uart > btbcm brcmutil bluetooth cfg80211 w5100_spi bcm2835_v4l2(C) w5100 v4l2_common > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev > microchip lan78xx crct10dif_ce drm_panel_orientation_quirks i2c_gpio > ecdh_generic edt_ft5x06 raspberrypi_hwmon ecc media rfkill snd_bcm2835(C) > input_polldev bcm2835_thermal spi_bcm2835 bcm2835_dma pwm_bcm2835 bcm2835_rng > rng_core virt_dma ip_tables x_tables ipv6 > [ 170.186506] Process spi0 (pid: 1335, stack limit = 0x00000000fb9981f5) > [ 170.193133] CPU: 0 PID: 1335 Comm: spi0 Tainted: G WC 5.1.0- > nico+ #116 > [ 170.200985] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT) > [ 170.207343] pstate: 60000005 (nZCv daif -PAN -UAO) > [ 170.212205] pc : 0x100000 > [ 170.214864] lr : spi_res_release.part.9+0x7c/0xb0 > [ 170.219631] sp : ffff000011303ce0 > [ 170.222987] x29: ffff000011303ce0 x28: ffff00001122bc30 > [ 170.228373] x27: 00000000cccccccd x26: ffff000010ee96c8 > [ 170.233759] x25: dead000000000100 x24: dead000000000200 > [ 170.239143] x23: ffff000010d28b78 x22: ffff80002fa60e80 > [ 170.244528] x21: ffff80002567b000 x20: ffff00001122bc88 > [ 170.249914] x19: ffff00001122bc30 x18: 0000000000000020 > [ 170.255297] x17: 0000000000000000 x16: 0000000000000000 > [ 170.260682] x15: ffffffffffffffff x14: ffff000010ee96c8 > [ 170.266068] x13: ffff000010fd3388 x12: ffff000010fd2fb7 > [ 170.271453] x11: 0000000000000000 x10: ffff000011303ce0 > [ 170.276840] x9 : ffff000011303ce0 x8 : 000000000000000f > [ 170.282226] x7 : 0000000000000005 x6 : ffff000010fd264d > [ 170.287612] x5 : 0000000000000000 x4 : 0000000000000000 > [ 170.292998] x3 : 0000000000100000 x2 : ffff80002fa60e98 > [ 170.298384] x1 : ffff00001122bc30 x0 : ffff80002567b000 > [ 170.303769] Call trace: > [ 170.306247] 0x100000 > [ 170.308550] spi_transfer_one_message+0x268/0x388 > [ 170.313321] __spi_pump_messages+0x200/0x508 > [ 170.317651] spi_pump_messages+0x14/0x20 > [ 170.321630] kthread_worker_fn+0xa8/0x188 > [ 170.325694] kthread+0x120/0x128 > [ 170.328966] ret_from_fork+0x10/0x18 > [ 170.332595] Code: bad PC value > [ 170.335694] ---[ end trace d054c016d94fa755 ]--- It seems the SPI controller thread is racing with the device's thread. Something like this: SPI Controller Thread SPI Device Thread -> spi_sync_transfer() creates spi_message on stack then sleeps until finished [SPI transfer happens...] -> spi_finalize_current_message() which wakes up SPI Device Thread -> spi_sync_transfer() returns, the message disapears from the stack -> spi_res_release() there is no more spi_message and the memory is potentialy used for something else I've been looking at the spi_split_transfers_maxsize() code and can't think of a reason why spi_res_release() couldn't be placed before spi_finalize_current_message(). Which would solve the issue, but I guess Noralf has a better perspective on the topic. Regards, Nicolas
Attachment:
signature.asc
Description: This is a digitally signed message part