On Wed, 1 Nov 2023 16:56:55 +0100, Stanislaw Gruszka wrote: >On Sat, Oct 28, 2023 at 08:15:30PM +0800, Shiji Yang wrote: >> When I tried to fix the watchdog of rt2800, I found that sometimes >> the watchdog could not reset the hung device. This is because the >> queue did not completely stop, it just became very slow. The Mediatek >> vendor driver for the new chips (MT7603/MT7612) has a DMA busy >> watchdog to detect device hangs by checking DMA busy status. This >> implementation is something similar to it. To reduce unnecessary >> watchdog reset, we can check the INT_STATUS register together as I >> found that when the radio hung, the RX/TX coherent interrupt will >> always stuck at triggered state. >> >> This patch also changes the watchdog module parameters to the new >> 'hang_watchdog' and 'dma_busy_watchdog' so that we can control them >> separately. That's because they may have different behavior on >> specific chip. >> >> This watchdog function is a slight schedule and it won't affect the >> WiFi transmission speed. Watchdog can help the driver automatically >> recover from the abnormal state. So I think it should be default on. >> Anyway it can be disabled by module parameter 'dma_busy_watchdog=0'. >> >> Tested on MT7620 and RT5350. > >I think this will not work on USB as INT_SOURCE_CSR is mmio/pci >specific. Did you tested on USB ? Or this is disabled for USB by >default? Hi! Thanks for your information. I didn't realize that they have such difference. I don't have Ralink USB NIC so I only test it on RT5350 and MT7620. I'll disable it for USB devices in v2. > >> Signed-off-by: Shiji Yang <yangshiji66@xxxxxxxxxxx> >> --- >> .../net/wireless/ralink/rt2x00/rt2800lib.c | 81 ++++++++++++++++--- >> drivers/net/wireless/ralink/rt2x00/rt2x00.h | 3 + >> 2 files changed, 72 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c >> index 594dd3d9f..6ca2f2c23 100644 >> --- a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c >> +++ b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c >> @@ -30,9 +30,15 @@ >> #include "rt2800lib.h" >> #include "rt2800.h" >> >> -static bool modparam_watchdog; >> -module_param_named(watchdog, modparam_watchdog, bool, S_IRUGO); >> -MODULE_PARM_DESC(watchdog, "Enable watchdog to detect tx/rx hangs and reset hardware if detected"); >> +static bool modparam_dma_wdt = true; >> +module_param_named(dma_busy_watchdog, modparam_dma_wdt, bool, 0444); >> +MODULE_PARM_DESC(dma_busy_watchdog, "Enable watchdog to detect tx/rx" >> + " DMA busy and reset hardware if detected"); >> + >> +static bool modparam_hang_wdt; >> +module_param_named(hang_watchdog, modparam_hang_wdt, bool, 0444); >> +MODULE_PARM_DESC(hang_watchdog, "Enable watchdog to detect tx/rx hangs" >> + " and reset hardware if detected"); > >Do not have strong opinion here. But please consider to keep old module >parameter name and make it bitmask, 1 - hang_wdt, 2 dma_wdt, 3 - both. >Such way, it will keep old meaning if someone is using the parameter in >script/config. I'll update them in v2. Thanks. > >I also wandering if we need two implementations. If dma >hang detection is superior, it should replace the old one IMHO. >Or queue hang should be used for USB and dma hang for pci/mmio ? Since the DMA busy watchdog will be disabled for USB NICs in v2, so I think we still need both of them. As for queue hang issue, To be honest, I never really encounter it on MT7620 in past half year. Enabling queue hang watchdog just report some wrong resets for me even when I increased the threshold from 16 to 64. And the large threshold will make it unable to catch the DMA hung issue. Anyway, it should be still valuable for USB device. > >Otherwise code looks fine/correct to me. > >Regards >Stanislaw