Re: dd hangs when reading large partitions

Marc Gonzalez <marc.w.gonzalez@xxxxxxx> · Thu, 7 Feb 2019 17:56:29 +0100



On 07/02/2019 11:44, Marc Gonzalez wrote:

> + linux-mm
> 
> Summarizing the issue for linux-mm readers:
> 
> If I read data from a storage device larger than my system's RAM, the system freezes
> once dd has read more data than available RAM.
> 
> # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done
> https://pastebin.ubuntu.com/p/HXzdqDZH4W/
> 
> A few seconds before the system hangs, Mem-Info shows:
> 
> [   90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...]
> 
> => 3797500kB is basically all of RAM.
> 
> I tried to locate where "inactive_file" was being increased from, and saw two signatures:
> 
> [  255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead
> 
> [  255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da
> 
> Are these expected?
> 
> NB: the system does not hang if I specify 'iflag=direct' to dd.
> 
> According to the RCU watchdog:
> 
> [  108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  108.466420] rcu:     1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 
> [  108.471436] rcu:     (detected by 4, t=5252 jiffies, g=133, q=85)
> [  108.480605] Task dump for CPU 1:
> [  108.486483] kworker/1:1H    R  running task        0   680      2 0x0000002a
> [  108.489977] Workqueue: kblockd blk_mq_run_work_fn
> [  108.496908] Call trace:
> [  108.501513]  __switch_to+0x174/0x1e0
> [  108.503757]  blk_mq_run_work_fn+0x28/0x40
> [  108.507589]  process_one_work+0x208/0x480
> [  108.511486]  worker_thread+0x48/0x460
> [  108.515480]  kthread+0x124/0x130
> [  108.519123]  ret_from_fork+0x10/0x1c
> 
> Can anyone shed some light on what's going on?

Saw a slightly different report from another test run:
https://pastebin.ubuntu.com/p/jCywbKgRCq/

[  340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  340.689992] rcu:     1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 
[  340.694977] rcu:     (detected by 5, t=5430 jiffies, g=-719, q=16)
[  340.703803] Task dump for CPU 1:
[  340.709507] dd              R  running task        0   675    673 0x00000002
[  340.713018] Call trace:
[  340.720059]  __switch_to+0x174/0x1e0
[  340.722192]  0xffffffc0f6dc9600

[  352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s!
[  352.689910] Showing busy workqueues and worker pools:
[  352.696743] workqueue mm_percpu_wq: flags=0x8
[  352.701753]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  352.706099]     pending: vmstat_update

[  384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s!
[  384.693815] Showing busy workqueues and worker pools:
[  384.700577] workqueue events: flags=0x0
[  384.705699]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  384.709351]     pending: vmstat_shepherd
[  384.715587] workqueue mm_percpu_wq: flags=0x8
[  384.719495]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  384.723754]     pending: vmstat_update