On Sun, Jul 9, 2017 at 3:49 PM, Stanimir Varbanov <stanimir.varbanov@xxxxxxxxxx> wrote: > Hi Rob, > > On 07/09/2017 04:19 PM, Rob Clark wrote: >> Not entirely sure what triggers it, but with venus build as kernel >> module and in initrd, we hit this crash: > > Is it happens occasionally or everytime in the initrd? And also with > your patch it will bail out on venus_probe, does it crash again on next > venus_probe? seems to happen every time.. (module is ending up in initrd, but not sure if fw is.. which might be triggering this?) I could not boot successfully without this patch, but otoh I haven't yet tried if venus actually works after boot. BR, -R >> >> Unable to handle kernel paging request at virtual address ffff80003c039000 >> pgd = ffff00000a14f000 >> [ffff80003c039000] *pgd=00000000bd9f7003, *pud=00000000bd9f6003, *pmd=00000000bd9f0003, *pte=0000000000000000 >> Internal error: Oops: 96000007 [#1] SMP >> Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) videodev(E) media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E) >> pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E) >> CPU: 2 PID: 551 Comm: irq/150-venus Tainted: P E 4.12.0+ #1625 >> Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017 >> task: ffff800037338000 task.stack: ffff800038e00000 >> PC is at hfi_sys_init_done+0x64/0x140 [venus_core] >> LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core] >> pc : [<ffff00000118b384>] lr : [<ffff00000118c11c>] pstate: 20400145 >> sp : ffff800038e03c60 >> x29: ffff800038e03c60 x28: 0000000000000000 >> x27: 00000000000df018 x26: ffff00000118f4d0 >> x25: 0000000000020003 x24: ffff80003a8d3010 >> x23: ffff00000118f760 x22: ffff800037b40028 >> x21: ffff8000382981f0 x20: ffff800037b40028 >> x19: ffff80003c039000 x18: 0000000000000020 >> x17: 0000000000000000 x16: ffff800037338000 >> x15: ffffffffffffffff x14: 0000001000000014 >> x13: 0000000100001007 x12: 0000000100000020 >> x11: 0000100e00000000 x10: 0000000000000001 >> x9 : 0000000200000000 x8 : 0000001400000001 >> x7 : 0000000000001010 x6 : 0000000000000148 >> x5 : 0000000000001009 x4 : ffff80003c039000 >> x3 : 00000000cd770abb x2 : 0000000000000042 >> x1 : 0000000000000788 x0 : 0000000000000002 >> Process irq/150-venus (pid: 551, stack limit = 0xffff800038e00000) >> Call trace: >> [<ffff00000118b384>] hfi_sys_init_done+0x64/0x140 [venus_core] >> [<ffff00000118c11c>] hfi_process_msg_packet+0xcc/0x1e8 [venus_core] >> [<ffff00000118a2b4>] venus_isr_thread+0x1b4/0x208 [venus_core] >> [<ffff00000118e750>] hfi_isr_thread+0x28/0x38 [venus_core] >> [<ffff000008161550>] irq_thread_fn+0x30/0x70 >> [<ffff0000081617fc>] irq_thread+0x14c/0x1c8 >> [<ffff000008105e68>] kthread+0x138/0x140 >> [<ffff000008083590>] ret_from_fork+0x10/0x40 >> Code: 52820125 52820207 7a431820 54000249 (b9400263) >> ---[ end trace c963460f20a984b6 ]--- >> >> The problem is that in the error case, we've incremented the data ptr >> but not decremented rem_bytes, and keep reading (presumably garbage) >> until eventually we go beyond the end of the buffer. >> >> Instead, on first error, we should probably just bail out. Other >> option is to increment read_bytes by sizeof(u32) before the switch, >> rather than only accounting for the ptype header in the non-error >> case. Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER, >> ie. an unrecognized/unsupported parameter, so interpreting the next >> word as a property type would be bogus. The other error cases are >> due to truncated buffer, so there isn't likely to be anything valid >> to interpret in the remainder of the buffer. So just bailing seems >> like a reasonable solution. > > I have a WIP patch which rewrite the message parsing, it would be nice > if I can reproduce this crash. > >> >> Signed-off-by: Rob Clark <robdclark@xxxxxxxxx> >> --- >> drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++++++----- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c b/drivers/media/platform/qcom/venus/hfi_msgs.c >> index debf80a92797..4190825b20a1 100644 >> --- a/drivers/media/platform/qcom/venus/hfi_msgs.c >> +++ b/drivers/media/platform/qcom/venus/hfi_msgs.c >> @@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, struct venus_inst *inst, >> break; >> } >> >> - if (!error) { >> - rem_bytes -= read_bytes; >> - data += read_bytes; >> - num_properties--; >> - } >> + if (error) >> + break; >> + >> + rem_bytes -= read_bytes; >> + data += read_bytes; >> + num_properties--; >> } >> >> err_no_prop: >> > > -- > regards, > Stan