On 1/22/19 6:59 PM, Marc Gonzalez wrote: > On 22/01/2019 04:12, Jianchao Wang wrote: > >> On 1/21/19 11:22 PM, Marc Gonzalez wrote: >> >>> Well, now we know for sure that the clk_scaling_lock is a red herring. >>> I applied the patch below, and still the system locked up: >>> >>> # dd if=/dev/sde of=/dev/null bs=1M status=progress >>> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s >>> >>> I can't seem to get the RCU stall warning anymore. How can I get >>> to the bottom of this issue? >> Can you detail the system 'locked up' ? >> dd hangs there ? any hung task warning log ? >> hang forever or just hang for a relatively long time. > The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS. > USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead). > All I have is a single serial console. > When the shell hangs, I lose access to the system altogether. > SysRq is not implemented either. I am blind once the shell locks up. > The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-) > >> And what is the status of the dd when it hangs ? >> Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ? > Sadly, I cannot access this information once the shell locks up. > > However, the kernel did print many warnings overnight (see below). > >> And also would you please share the dmesg log and config ? > See below. > >> Since always fails with buffered read with fixed bytes, >> what is the capacity of your system memory ? > 4GB RAM. And the system hangs after reading 3.8GB > I think this is not a coincidence. > NB: swap is disabled (this might be relevant) Look through the log https://pastebin.ubuntu.com/p/YSm82GxhNW/ rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 rcu: (detected by 4, t=576151 jiffies, g=-391, q=18) Task dump for CPU 6: dd R running task 0 677 671 0x00000002 Call trace: __switch_to+0x174/0x1e0 ufshcd_queuecommand+0x84c/0x9a8 The task is in RUNNING state when it was scheduled out. So it should be a preempt (the path is under preemptible rcu). And I wonder why it is not scheduled back for so long time that rcu stall was triggered. And who was occupying the cpu all the time ? Would you please try to show all running tasks on all cpu ? echo l > /proc/sysrq-trigger In addition, since the rcu grace period didn't pass, a lot of things could not be moved forward. Thanks Jianchao