Re: dd hangs when reading large partitions

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Fri, 18 Jan 2019 14:27:07 -0500

On 2019-01-18 7:10 a.m., Marc Gonzalez wrote:
Hello,

I'm running into an issue which I don't know how to debug.
So I'm open to ideas and suggestions :-)

On my arm64 board, I have enabled Universal Flash Storage support.

I wanted to benchmark read performance, and noticed that the system
locks up when I read partitions larger than 3.5 GB, unless I tell
dd to use direct IO:

Marc,
If you want to benchmark (or torture) UFS read performance I have
many dd variants. In the sg3_utils package there is sgp_dd (using
POSIX threads for a multi-threaded copy or read) and sgm_dd (using
mmap-ed IO). Plus there is a multi-platform ddpt in a package of
the same name.

One major difference between my dd variants and the "standard" one
is that I split the bs=BS option in two: one called bs=BS where BS
is the logical block size of the given device, the other is bpt=BPT
(blocks per transfer) which is the number of logical blocks in each
copy (or read) segment.

So with your example below bs=1M would become, for a LB size of 4096,
'bs=4096 bpt=16k'. Also sgp_dd and sgm_dd don't support status=progress
(ddpt does) but you can always send 'kill -s USR1 <pid_of_dd>'
from another (virtual) console that has root permissions. All my
dd variants and dd itself will accept that signal gracefully and output
a progress report then continue.

Doug Gilbert

*** WITH O_DIRECT ***
# dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s
55256+0 records in
55256+0 records out
57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s

*** WITHOUT O_DIRECT ***
# dd if=/dev/sda of=/dev/null bs=1M status=progress
3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626
rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626
rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
Task dump for CPU 1:
kworker/1:1H    R  running task        0   675      2 0x0000002a
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
  __switch_to+0x168/0x1d0
  0xffffffc0f6efbbc8
  blk_mq_run_work_fn+0x28/0x40
  process_one_work+0x208/0x470
  worker_thread+0x48/0x460
  kthread+0x128/0x130
  ret_from_fork+0x10/0x1c
Task dump for CPU 6:
kthreadd        R  running task        0     2      0 0x0000002a
Call trace:
  __switch_to+0x168/0x1d0
  0x5b36396f4e7d4000

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500
rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500
rcu:    (detected by 7, t=21009 jiffies, g=-275, q=2)
Task dump for CPU 1:
kworker/1:1H    R  running task        0   675      2 0x0000002a
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
  __switch_to+0x168/0x1d0
  0xffffffc0f6efbbc8
  blk_mq_run_work_fn+0x28/0x40
  process_one_work+0x208/0x470
  worker_thread+0x48/0x460
  kthread+0x128/0x130
  ret_from_fork+0x10/0x1c
Task dump for CPU 6:
kthreadd        R  running task        0     2      0 0x0000002a
Call trace:
  __switch_to+0x168/0x1d0
  0x5b36396f4e7d4000

The system always hangs around the 3.6 GiB mark, wherever I start from.
How can I debug this issue?

Regards.