On 18.01.2023 14:32, Peter Suti wrote: > On Thu, Jan 12, 2023 at 10:27 PM Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote: >> >> On 09.01.2023 12:52, Peter Suti wrote: >>> On Wed, Dec 14, 2022 at 10:33 PM Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote: >>>> >>>> On 14.12.2022 14:46, Peter Suti wrote: >>>>> >>>>> diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c >>>>> index 6e5ea0213b47..7d3ee2f9a7f6 100644 >>>>> --- a/drivers/mmc/host/meson-gx-mmc.c >>>>> +++ b/drivers/mmc/host/meson-gx-mmc.c >>>>> @@ -1023,6 +1023,22 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id) >>>>> if (ret == IRQ_HANDLED) >>>>> meson_mmc_request_done(host->mmc, cmd->mrq); >>>>> >>>>> + /* >>>>> + * Sometimes after we ack all raised interrupts, >>>>> + * an IRQ_SDIO can still be pending, which can get lost. >>>>> + * >>>> >>>> A reader may scratch his head here and wonder how the interrupt can get lost, >>>> and why adding a workaround instead of eliminating the root cause for losing >>>> the interrupt. If you can't provide an explanation why the root cause for >>>> losing the interrupt can't be fixed, presumably you would have to say that >>>> you're adding a workaround for a suspected silicon bug. >>> After talking to the manufacturer, we got the following explanation >>> for this situation: >> >> To which manufacturer did you talk, Marvell or Amlogic? > > Amlogic > >> >>> "wifi may have dat1 interrupt coming in, without this the dat1 >>> interrupt would be missed" >> >> I don't understand this sentence. W/o the interrupt coming in >> the interrupt would be missed? Can you explain it? > > So the "without this" part of that sentence referred to the patch. > Which means that without the patch, the interrupt can be missed. > >> >>> Supposedly this is fixed in their codebase. >> >> Which codebase do you mean? A specific vendor driver? Or firmware? > > The vendor driver from openlinux2.amlogic.com handles SDIO interrupts > a bit differently. It uses mmc_signal_sdio_irq() > >> >>> Unfortunately we were not able to find out more and can't prepare a >>> patch with a proper explanation. >> >> The workaround is rather simple, so we should add it. >> It's just unfortunate that we have no idea about the root cause of the issue. > > After doing a more long term stress test, it was revealed that this > patch is still not sufficient, but only masks the underlying problem > better. Reverting 066ecde fixes the issue fully for us (verified > overnight with iperf). > v1 and v2 also fix the issue, but v3 does not correct the bug when > WiFi is stressed for a longer time. And therefore it should not be > used. > Could you please give us some advice on how to investigate this further? One more thought: When checking device tree I found that my system uses IRQ_TYPE_LEVEL_HIGH for the SDIO interrupt. meson-g12-common.dtsi uses IRQ_TYPE_EDGE_RISING in mainline Linux, however vendor kernel uses IRQ_TYPE_LEVEL_HIGH in meson-g12-common.dtsi. A wrong interrupt trigger type may result in lost interrupts. So you could check which trigger type your system uses for the SDIO interrupt. If it's IRQ_TYPE_EDGE_RISING, re-test with IRQ_TYPE_LEVEL_HIGH.