On 23/07/2024 15:43, Konrad Dybcio wrote: > On 23.07.2024 3:38 PM, Marc Gonzalez wrote: > >> On 23/07/2024 15:08, Konrad Dybcio wrote: >> >>> On 23.07.2024 2:57 PM, Marc Gonzalez wrote: >>> >>>> On 23/07/2024 13:45, Konrad Dybcio wrote: >>>> >>>>> On 23.07.2024 11:59 AM, Dmitry Baryshkov wrote: >>>>> >>>>>> On Tue, 23 Jul 2024 at 12:48, Marc Gonzalez wrote: >>>>>> >>>>>>> On 16/07/2024 18:37, Dmitry Baryshkov wrote: >>>>>>> >>>>>>>> No, that's fine. It is the SMMU issue that Konrad has been asking you >>>>>>>> to take a look at. >>>>>>> >>>>>>> Context: >>>>>>> >>>>>>> [ 4.911422] arm-smmu cd00000.iommu: FSR = 00000402 [Format=2 TF], SID=0x0 >>>>>>> [ 4.923353] arm-smmu cd00000.iommu: FSYNR0 = 00000021 [S1CBNDX=0 PNU PLVL=1] >>>>>>> [ 4.927893] arm-smmu cd00000.iommu: FSR = 00000402 [Format=2 TF], SID=0x0 >>>>>>> [ 4.941928] arm-smmu cd00000.iommu: FSYNR0 = 00000021 [S1CBNDX=0 PNU PLVL=1] >>>>>>> [ 4.944438] arm-smmu cd00000.iommu: FSR = 00000402 [Format=2 TF], SID=0x0 >>>>>>> [ 4.956013] arm-smmu cd00000.iommu: FSYNR0 = 00000021 [S1CBNDX=0 PNU PLVL=1] >>>>>>> [ 4.961055] arm-smmu cd00000.iommu: FSR = 00000402 [Format=2 TF], SID=0x0 >>>>>>> [ 4.974565] arm-smmu cd00000.iommu: FSYNR0 = 00000021 [S1CBNDX=0 PNU PLVL=1] >>>>>>> [ 4.977628] arm-smmu cd00000.iommu: FSR = 00000402 [Format=2 TF], SID=0x0 >>>>>>> [ 4.989670] arm-smmu cd00000.iommu: FSYNR0 = 00000021 [S1CBNDX=0 PNU PLVL=1] >>>>>>> >>>>>>> >>>>>>> As I mentioned, I don't think I've ever seen issues from cd00000.iommu >>>>>>> on my board. >>>>>> >>>>>> Interestingly enough, I can also see iommu errors during WiFi startup >>>>>> / shutdown on msm8998 / miix630. This leads me to thinking that it >>>>>> well might be that there is a missing quirk in the iommu driver. >>>>>> >>>>>>> I can test a reboot loop for a few hours, to see if anything shows up. >>>>>> >>>>>> Yes, please. >>>>> >>>>> Yeah I do trust you Marc that it actually works for you and I'm not >>>>> gonna delay this series because of that, but please go ahead and >>>>> reboot-loop your board >>>>> >>>>> 8998/660 is """famous""" for it's iommu problems >>>> >>>> [ 20.501062] arm-smmu 16c0000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x1, cbfrsynra=0x1900, cb=0 >>>> >>>> I get the above warning pretty reliably. >>>> I don't think it's related to the issue(s) you mentioned. >>>> System just keeps plodding along. >>> >>> Yeah that one's "fine" >> >> I booted 40 times in a loop. >> >> `grep -a -i FSYNR console.logs` just returns the same 16c0000.iommu >> "Unhandled context fault" message 76 times (as above). >> >> NB: I have maxcpus=1 set in bootargs. >> >> Could the iommu issue be a race condition, NOT triggered when code >> runs with less parallelism? > > No clue, can you try without maxcpus=1? Same behavior without maxcpus=1 40 boots, no panics, no FSYNR other than 16c0000.iommu > The thing will likely run slower (because reasons), but shouldn't > explode That makes sense! - Hey, boot is slow. What can we do to make it slower? - Well, just add a bunch of cores running in parallel, that will get the job done! As a matter of fact, trying to boot to command-line with maxcpus=1 causes the system to lock up & reboot. I had to add a systemd script to enable some cores at init. Some qcom daemon must be locking a core & expect progress from another process. Regards