On 21-Apr-20 3:19 PM, Jon Hunter wrote: > Hi Dmitry, > > On 21/04/2020 01:32, Dmitry Osipenko wrote: >> 21.04.2020 01:11, Dmitry Osipenko пишет: >>> Hello Jon, >>> >>> 20.04.2020 22:53, Jon Hunter пишет: >>>> Hi Dmitry, >>>> >>>> On 24/03/2020 19:12, Dmitry Osipenko wrote: >>>>> Boot CPU0 always handle I2C interrupt and under some rare circumstances >>>>> (like running KASAN + NFS root) it may stuck in uninterruptible state for >>>>> a significant time. In this case we will get timeout if I2C transfer is >>>>> running on a sibling CPU, despite of IRQ being raised. In order to handle >>>>> this rare condition, the IRQ status needs to be checked after completion >>>>> timeout. >>>>> >>>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>>>> --- >>>>> drivers/i2c/busses/i2c-tegra.c | 27 +++++++++++++++------------ >>>>> 1 file changed, 15 insertions(+), 12 deletions(-) >>>> >>>> I have noticed a regression on tegra30-cardhu-a04 when testing system >>>> suspend. Git bisect is pointing to this commit and reverting it fixes >>>> the problem. In the below console log I2C fails to resume ... >>>> >>> ... >>>> [ 60.690035] PM: Device 3000.pcie failed to resume noirq: error -16 >>> ... >>>> [ 60.696859] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: EMEM address decode error (SMMU translation error [--S]) >>>> >>>> [ 60.708876] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: Page fault (SMMU translation error [--S]) >>> This looks very wrong, the error tells that 3d hardware is active and >>> doing something odd. Are you running some 3d tests? > I am not running any GFX tests. However, I am not sure if the above is > unrelated. > >>>> Have you seen this? >>> No, I haven't seen that. I'm not using PCIE and it looks like it's the >>> problem. >>> >>> Looking at the PCIE driver code, seems it's not syncing the RPM state on >>> suspend/resume. >>> >>> Please try this change: >>> >>> --- >8 --- >>> diff --git a/drivers/pci/controller/pci-tegra.c >>> b/drivers/pci/controller/pci-tegra.c >>> index 3e64ba6a36a8..b1fcbae4109c 100644 >>> --- a/drivers/pci/controller/pci-tegra.c >>> +++ b/drivers/pci/controller/pci-tegra.c >>> @@ -2870,8 +2870,8 @@ static int __maybe_unused >>> tegra_pcie_pm_resume(struct device *dev) >>> >>> static const struct dev_pm_ops tegra_pcie_pm_ops = { >>> SET_RUNTIME_PM_OPS(tegra_pcie_pm_suspend, tegra_pcie_pm_resume, NULL) >>> - SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_pcie_pm_suspend, >>> - tegra_pcie_pm_resume) >>> + SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, >>> + pm_runtime_force_resume) >>> }; >>> >>> >>> static struct platform_driver tegra_pcie_driver = { >>> --- >8 --- >>> >>> Secondly, I2C driver suspends on NOIRQ level, while APBDMA driver >>> suspends on default level. This is also wrong, please try to apply this >>> hunk as well: >>> >>> --- >8 --- >>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c >>> index f6a2f42ffc51..e682ac86bd27 100644 >>> --- a/drivers/dma/tegra20-apb-dma.c >>> +++ b/drivers/dma/tegra20-apb-dma.c >>> @@ -1653,7 +1653,7 @@ static int __maybe_unused >>> tegra_dma_dev_resume(struct device *dev) >>> static const struct dev_pm_ops tegra_dma_dev_pm_ops = { >>> SET_RUNTIME_PM_OPS(tegra_dma_runtime_suspend, tegra_dma_runtime_resume, >>> NULL) >>> - SET_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume) >>> + SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume) >>> }; >>> >>> static const struct of_device_id tegra_dma_of_match[] = { >>> --- >8 --- >>> >> Although, I'm now having a second though about the APBDMA change... I'm >> recalling that there are some complications in regards to PCIE driver >> suspending, requiring it to be at NOIRQ level, but this should be wrong >> because PCIE driver uses voltage regulator driver at NOIRQ level, while >> regulator drivers suspend on default level. The current behavior of the >> PCIE driver should be wrong, I think it needs to be moved to the default >> suspend-resume level somehow. > I can try the above, but I agree it would be best to avoid messing with > the suspend levels if possible. > > I am adding Manikanta to get some feedback on why we moved the PCI > suspend to the NOIRQ phase because it is not clear to me if we need to > do this here. > > Manikanta, can you comment on whether we really need to suspend Tegra > PCI during the noirq phase? PCIe subsystem driver implemented noirq PM callbacks, it will save & restore endpoint config space in these PM callbacks. PCIe controller should be available during this time, so noirq PM callbacks are implemented in Tegra PCIe driver. file: drivers/pci/pci-driver.c static const struct dev_pm_ops pci_dev_pm_ops = { ... .suspend_noirq = pci_pm_suspend_noirq, .resume_noirq = pci_pm_resume_noirq, ... }; Thanks, Manikanta > > Cheers > Jon >