Hi Doug, On Fri, Dec 6, 2024 at 1:13 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > Hi, > > On Thu, Dec 5, 2024 at 5:46 AM Pin-yen Lin <treapking@xxxxxxxxxxxx> wrote: > > > > Hi Doug, > > > > On Thu, Dec 5, 2024 at 2:11 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > On Wed, Dec 4, 2024 at 5:45 AM Pin-yen Lin <treapking@xxxxxxxxxxxx> wrote: > > > > > > > > > > > Can you try testing (and gather timing numbers) when > > > > > > > suspending soon after initiating scans? It's hard to judge what the > > > > > > > lower limit of this timeout should really be without any numbers, just > > > > > > > like it's hard to judge whether your 10 second watchdog is reasonable. > > > > > > > > > > > > Pin-yen: is this something you could gather? > > > > > > > > I tried entering suspend right after wifi scans, and the time spent in > > > > mwifiex_enable_hs() is always around 100ms. It seems initiating > > > > suspend does not increase the execution time for mwifiex_enable_hs(), > > > > so I think the driver is capable of interrupting a scan. > > > > > > > > > > > > > > > > > > > Also, for the record, since we might have to field regression reports > > > > > > > for other systems: what hardware (chip variant, FW version) are you > > > > > > > seeing problems on? > > > > > > > > > > > > Pin-yen: I'm assuming you'll provide this. > > > > > > > > From the debugfs entry: > > > > > > > > driver_name = "mwifiex" > > > > driver_version = mwifiex 1.0 (15.68.19.p54) > > > > verext = w8897o-B0, RF87XX, FP68, 15.68.19.p54 > > > > > > > > The compatible string of the DT is "marvell,sd8897". > > > > > > > > > > I'll leave it up to y'all (Doug and Pin-Yen) whether you want to provide > > > > > the above to provide a little more confidence, or if you want to > > > > > reconsider your use of CONFIG_DPM_WATCHDOG_TIMEOUT. > > > > > > > > It looks okay to me to decrease the timeout here given that scanning > > > > doesn't seem to affect the suspend time. What's your thought, Doug? > > > > > > I think Brian is right and that we should change how we're using the > > > DPM watchdog, but IMO that doesn't mean that we couldn't also change > > > this timeout. > > > > > > If nothing else, you'd want to post a v2 of your patch containing a > > > summary of the info you've gathered so it's recorded with the patch. > > > > > > Maybe what makes the most sense, though, is to put our money where our > > > mouth is and land some sort of patch in the ChromeOS tree and then > > > report back to upstream in a month when we have data about it. If > > > things look good in the field then presumably that would give some > > > confidence for upstream to be willing to land the patch? > > > > > > Probably about the best data we could gather would be to break the > > > existing timeout into two halves. You could wait half the time, then > > > print a warning, and then wait the other half the time. We could even > > > land that change _without_ changing the timeout to 5 seconds. Then if > > > we ever see the warning print but then the suspend succeeds we know > > > that there are cases where waiting longer would have helped. If we > > > made that a WARN_ON() our existing infrastructure would help us gather > > > that info... > > > > I just realized that mwifiex_wait_queue_complete() has another 12s > > timeout[1], though they are not directly involved in suspend/resume. > > > > Should we also add a warning to that and see if we can make it half? > > This starts to make me think how many timeouts we want to consider. > > Does it make sense to only focus on the one in mwifiex_enable_hs()? Or > > should we check other timeouts in mwifiex or even other drivers? > > > > [1]: https://elixir.bootlin.com/linux/v6.12.1/source/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c#L51 > > IMO any of these "arbitrary and really long timeout to make sure we > don't hang forever" type things probably should have a warning so we > know if we're getting close to the timeout. It wouldn't be very hard > to make a wrapper around wait_event_interruptible_timeout() to handle > this. I suppose that wrapper could just be local to mwifiex, though if > other drivers found it useful it would make sense to put it somewhere > more common. > > That being said, if we aren't actually hitting these other timeouts I > don't know that we need to do an extensive audit right now to find > every one of them. > > Of course, Brian also said he'd be OK with just setting the timeout to > 5 seconds based on the research you've already done. In that case I > don't know if we'd want a WARNing after 2.5 seconds. Maybe? 2.5 > seconds is still pretty long, but I don't know if it's WARN-worthy. It > could at least be dev_warn() worthy... > Sounds like we will start to binary search the timeout... Then, I would like to just send out a v2 for this with an updated commit message. It sounds like it will be endless binary searches on all those timeouts in the kernel if we really want to create a wrapper around wait_event_interruptible_timeout(), and all the efforts only make us a few more seconds faster on the system. > > > -Doug Regards, Pin-yen