Hi, On Thu, Dec 5, 2024 at 5:46 AM Pin-yen Lin <treapking@xxxxxxxxxxxx> wrote: > > Hi Doug, > > On Thu, Dec 5, 2024 at 2:11 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > > > Hi, > > > > On Wed, Dec 4, 2024 at 5:45 AM Pin-yen Lin <treapking@xxxxxxxxxxxx> wrote: > > > > > > > > > Can you try testing (and gather timing numbers) when > > > > > > suspending soon after initiating scans? It's hard to judge what the > > > > > > lower limit of this timeout should really be without any numbers, just > > > > > > like it's hard to judge whether your 10 second watchdog is reasonable. > > > > > > > > > > Pin-yen: is this something you could gather? > > > > > > I tried entering suspend right after wifi scans, and the time spent in > > > mwifiex_enable_hs() is always around 100ms. It seems initiating > > > suspend does not increase the execution time for mwifiex_enable_hs(), > > > so I think the driver is capable of interrupting a scan. > > > > > > > > > > > > > > > > Also, for the record, since we might have to field regression reports > > > > > > for other systems: what hardware (chip variant, FW version) are you > > > > > > seeing problems on? > > > > > > > > > > Pin-yen: I'm assuming you'll provide this. > > > > > > From the debugfs entry: > > > > > > driver_name = "mwifiex" > > > driver_version = mwifiex 1.0 (15.68.19.p54) > > > verext = w8897o-B0, RF87XX, FP68, 15.68.19.p54 > > > > > > The compatible string of the DT is "marvell,sd8897". > > > > > > > > I'll leave it up to y'all (Doug and Pin-Yen) whether you want to provide > > > > the above to provide a little more confidence, or if you want to > > > > reconsider your use of CONFIG_DPM_WATCHDOG_TIMEOUT. > > > > > > It looks okay to me to decrease the timeout here given that scanning > > > doesn't seem to affect the suspend time. What's your thought, Doug? > > > > I think Brian is right and that we should change how we're using the > > DPM watchdog, but IMO that doesn't mean that we couldn't also change > > this timeout. > > > > If nothing else, you'd want to post a v2 of your patch containing a > > summary of the info you've gathered so it's recorded with the patch. > > > > Maybe what makes the most sense, though, is to put our money where our > > mouth is and land some sort of patch in the ChromeOS tree and then > > report back to upstream in a month when we have data about it. If > > things look good in the field then presumably that would give some > > confidence for upstream to be willing to land the patch? > > > > Probably about the best data we could gather would be to break the > > existing timeout into two halves. You could wait half the time, then > > print a warning, and then wait the other half the time. We could even > > land that change _without_ changing the timeout to 5 seconds. Then if > > we ever see the warning print but then the suspend succeeds we know > > that there are cases where waiting longer would have helped. If we > > made that a WARN_ON() our existing infrastructure would help us gather > > that info... > > I just realized that mwifiex_wait_queue_complete() has another 12s > timeout[1], though they are not directly involved in suspend/resume. > > Should we also add a warning to that and see if we can make it half? > This starts to make me think how many timeouts we want to consider. > Does it make sense to only focus on the one in mwifiex_enable_hs()? Or > should we check other timeouts in mwifiex or even other drivers? > > [1]: https://elixir.bootlin.com/linux/v6.12.1/source/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c#L51 IMO any of these "arbitrary and really long timeout to make sure we don't hang forever" type things probably should have a warning so we know if we're getting close to the timeout. It wouldn't be very hard to make a wrapper around wait_event_interruptible_timeout() to handle this. I suppose that wrapper could just be local to mwifiex, though if other drivers found it useful it would make sense to put it somewhere more common. That being said, if we aren't actually hitting these other timeouts I don't know that we need to do an extensive audit right now to find every one of them. Of course, Brian also said he'd be OK with just setting the timeout to 5 seconds based on the research you've already done. In that case I don't know if we'd want a WARNing after 2.5 seconds. Maybe? 2.5 seconds is still pretty long, but I don't know if it's WARN-worthy. It could at least be dev_warn() worthy... -Doug