On 29.10.2019 15:44, Stephen Boyd wrote: > Quoting Maciej S. Szmigiero (2019-10-28 16:45:31) >> Hi Stephen, >> >> On 06.08.2019 01:32, Stephen Boyd wrote: >>> The hwrng_fill() function can run while devices are suspending and >>> resuming. If the hwrng is behind a bus such as i2c or SPI and that bus >>> is suspended, the hwrng may hang the bus while attempting to add some >>> randomness. It's been observed on ChromeOS devices with suspend-to-idle >>> (s2idle) and an i2c based hwrng that this kthread may run and ask the >>> hwrng device for randomness before the i2c bus has been resumed. >>> >>> Let's make this kthread freezable so that we don't try to touch the >>> hwrng during suspend/resume. This ensures that we can't cause the hwrng >>> backing driver to get into a bad state because the device is guaranteed >>> to be resumed before the hwrng kthread is thawed. >> >> This patch broke suspend with virtio-rng loaded (it hangs). >> >> The problematic call chain is: >> virtrng_freeze() -> remove_common() -> hwrng_unregister() -> >> kthread_stop(). >> >> It looks like kthread_stop() can't finish on a frozen khwrng thread. > > Can you provide the suspend/resume logs? There isn't much in the kernel log, the closest thing I can get is with dyndbg="file drivers/base/power/main.c +p": [ 58.441073][ T3511] virtio-pci 0000:00:06.0: bus freeze [ 58.448744][ T3511] virtio-pci 0000:00:05.0: bus freeze [ 58.454500][ T3511] virtio-pci 0000:00:04.0: bus freeze [ 58.456873][ T3511] virtio-pci 0000:00:03.0: bus freeze And then the VM hangs. The 0000:00:03.0 pci device is virtio-rng. If I add printks around that kthread_stop() in hwrng_unregister() only the first one gets printed. >> >> Reverting this commit makes a VM with virtio-rng driver loaded >> suspend and resume correctly again. > > Which kernel are you testing on? There was a fix to this commit, i.e. > ff296293b353 ("random: Support freezable kthreads in > add_hwgenerator_randomness()"), which was fixed again by 59b569480dc8 > ("random: Use wait_event_freezable() in add_hwgenerator_randomness()"). > There was a problem with suspend/resume that I tried to fix with the > first patch and then the second patch fixed the first one. See this > thread[1] for some more background. You'll want all three. > > [1] https://lkml.kernel.org/r/49fc7c64-88c0-74d0-2cb3-07986490941d@xxxxxx The kernel under test is current torvalds/master, I can see that it contains both commit ff296293b353 and commit 59b569480dc8. I assume that the third commit you mention is the original one that this e-mail message Subject line refers to (03a3bb7ae63150). Thanks, Maciej