On Fri, Sep 16, 2022 at 10:51 AM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > The other thing that occurred to me when reading this patch in context > of the other one is that this sleep you're removing here is not the > only sleep in the call chain. Each hwrng driver can also sleep, and > many do, sometimes for a long time, blocking until there's data > available, which might happen after minutes in some cases. So maybe > that's something to think about in context of this patchset -- that > just moving this to a delayed worker might not actually fix the issue > you're having with sleeps. > This is an excellent point. A look at tpm2_calc_ordinal_duration() reveals that tpm_transmit() may block for 300s at a time. So when we are using a WQ_FREEZABLE delayed_work, the PM may have to wait for up to 300s when draining the wq on suspend. That will introduce a lot of breakage in suspend/resume. Dominik: in light of this, please proceed with your patch, without rebasing it onto mine. + tpm maintainers Peter Huewe and Jarkko Sakkinen, a quick recap of the problem: - on ChromeOS we are seeing intermittent suspend/resume errors+warnings related to activity of the core's hwrng_fillfn. this kthread keeps runningduring suspend/resume. if this happens to kick off an bus (i2c) transaction while the bus driver is in suspend, this triggers a "Transfer while suspended" warning from the i2c core, followed by an error return: i2c_designware i2c_designware.1: Transfer while suspended tpm tpm0: i2c transfer failed (attempt 1/3): -108 [ snip 10s of transfer failed attempts] - in 2019, Stephen Boyd made an attempt at fixing this by making the hwrng_fillfn kthread freezable. But a freezable thread requires different API calls for scheduling, waiting, and timeout. This generated regressions, so the solution had to be reverted. https://patchwork.kernel.org/project/linux-crypto/patch/20190805233241.220521-1-swboyd@xxxxxxxxxxxx/ - the current patch attempts to halt hwrng_fillfn during suspend by converting it to a self-rearming delayed_work. The PM drains all work before going into suspend. But, the potential minute-long blocking delays in tpm make this solution infeasible. Peter and Jarkko, can you think of a possible way forward to eliminate the warnings+errors? -Sven