Re: dd hangs when reading large partitions

"jianchao.wang" <jianchao.w.wang@xxxxxxxxxx> · Wed, 23 Jan 2019 11:10:09 +0800

On 1/22/19 6:59 PM, Marc Gonzalez wrote:
> On 22/01/2019 04:12, Jianchao Wang wrote:
> 
>> On 1/21/19 11:22 PM, Marc Gonzalez wrote:
>>
>>> Well, now we know for sure that the clk_scaling_lock is a red herring.
>>> I applied the patch below, and still the system locked up:
>>>
>>> # dd if=/dev/sde of=/dev/null bs=1M status=progress
>>> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s
>>>
>>> I can't seem to get the RCU stall warning anymore. How can I get
>>> to the bottom of this issue?
>> Can you detail the system 'locked up' ?
>> dd hangs there ? any hung task warning log ?
>> hang forever or just hang for a relatively long time.
> The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS.
> USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead).
> All I have is a single serial console.
> When the shell hangs, I lose access to the system altogether.
> SysRq is not implemented either. I am blind once the shell locks up.
> The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-)
> 
>> And what is the status of the dd when it hangs ?
>> Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ?
> Sadly, I cannot access this information once the shell locks up.
> 
> However, the kernel did print many warnings overnight (see below).
> 
>> And also would you please share the dmesg log and config ?
> See below.
> 
>> Since always fails with buffered read with fixed bytes,
>> what is the capacity of your system memory ? 
> 4GB RAM. And the system hangs after reading 3.8GB
> I think this is not a coincidence.
> NB: swap is disabled (this might be relevant)

Look through the log
https://pastebin.ubuntu.com/p/YSm82GxhNW/

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 
rcu:    (detected by 4, t=576151 jiffies, g=-391, q=18)
Task dump for CPU 6:
dd              R  running task        0   677    671 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 ufshcd_queuecommand+0x84c/0x9a8

The task is in RUNNING state when it was scheduled out.
So it should be a preempt (the path is under preemptible rcu).

And I wonder why it is not scheduled back for so long time that rcu stall was triggered.
And who was occupying the cpu all the time ?

Would you please try to show all running tasks on all cpu ?

echo l > /proc/sysrq-trigger

In addition, since the rcu grace period didn't pass, a lot of things could not be moved
forward.

Thanks
Jianchao