On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:
Hi Robin,
Thanks for taking a look at this.
On 2020-03-27 19:42, Robin Murphy wrote:
On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
Currently on reboot/shutdown, the following messages are
displayed on the console as error messages before the
system reboots/shutdown.
On SC7180:
arm-smmu 15000000.iommu: removing device with active domains!
arm-smmu 5040000.iommu: removing device with active domains!
Demote the log level to debug since it does not offer much
help in identifying/fixing any issue as the system is anyways
going down and reduce spamming the kernel log.
I've gone back and forth on this pretty much ever since we added the
shutdown hook - on the other hand, if any devices *are* still running
in those domains at this point, then once we turn off the SMMU and let
those IOVAs go out on the bus as physical addresses, all manner of
weirdness may ensue. Thus there is an argument for *some* indication
that this may happen, although IMO it could be downgraded to at least
dev_warn().
Any pointers to the weirdness here after SMMU is turned off?
Because if we look at the call sites, device_shutdown is called
from kernel_restart_prepare or kernel_shutdown_prepare which would
mean system is going down anyways, so do we really care about these
error messages or warnings from SMMU?
arm_smmu_device_shutdown
platform_drv_shutdown
device_shutdown
kernel_restart_prepare
kernel_restart
Imagine your network driver doesn't implement a .shutdown method (so the
hardware is still active regardless of device links), happens to have an
Rx buffer or descriptor ring DMA-mapped at an IOVA that looks like the
physical address of the memory containing some part of the kernel text
lower down that call stack, and the MAC receives a broadcast IP packet
at about the point arm_smmu_device_shutdown() is returning. Enjoy
debugging that ;)
And if coincidental memory corruption seems too far-fetched for your
liking, other fun alternatives might include "display tries to scan out
from powered-off device, deadlocks interconnect and prevents anything
else making progress", or "access to TZC-protected physical address
triggers interrupt and over-eager Secure firmware resets system before
orderly poweroff has a chance to finish".
Of course the fact that in practice we'll *always* see the warning
because there's no way to tear down the default DMA domains, and even if
all devices *have* been nicely quiesced there's no way to tell, is
certainly less than ideal. Like I say, it's not entirely clear-cut
either way...
Robin.