On 07/02/2019 17:56, Marc Gonzalez wrote: > Saw a slightly different report from another test run: > https://pastebin.ubuntu.com/p/jCywbKgRCq/ > > [ 340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 340.689992] rcu: 1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 > [ 340.694977] rcu: (detected by 5, t=5430 jiffies, g=-719, q=16) > [ 340.703803] Task dump for CPU 1: > [ 340.709507] dd R running task 0 675 673 0x00000002 > [ 340.713018] Call trace: > [ 340.720059] __switch_to+0x174/0x1e0 > [ 340.722192] 0xffffffc0f6dc9600 > > [ 352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s! > [ 352.689910] Showing busy workqueues and worker pools: > [ 352.696743] workqueue mm_percpu_wq: flags=0x8 > [ 352.701753] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 352.706099] pending: vmstat_update > > [ 384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s! > [ 384.693815] Showing busy workqueues and worker pools: > [ 384.700577] workqueue events: flags=0x0 > [ 384.705699] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 > [ 384.709351] pending: vmstat_shepherd > [ 384.715587] workqueue mm_percpu_wq: flags=0x8 > [ 384.719495] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 384.723754] pending: vmstat_update Running 'dd if=/dev/sda of=/dev/null bs=40M status=progress' I got a slightly different splat: [ 171.513944] INFO: task dd:674 blocked for more than 23 seconds. [ 171.514131] Tainted: G S 5.0.0-rc5-next-20190206 #23 [ 171.518784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 171.525728] dd D 0 674 672 0x00000000 [ 171.533525] Call trace: [ 171.538926] __switch_to+0x174/0x1e0 [ 171.541237] __schedule+0x1e4/0x630 [ 171.545041] schedule+0x34/0x90 [ 171.548261] io_schedule+0x20/0x40 [ 171.551401] blk_mq_get_tag+0x178/0x320 [ 171.554852] blk_mq_get_request+0x13c/0x3e0 [ 171.558587] blk_mq_make_request+0xcc/0x640 [ 171.562763] generic_make_request+0x1d4/0x390 [ 171.566924] submit_bio+0x5c/0x1c0 [ 171.571447] mpage_readpages+0x178/0x1d0 [ 171.574730] blkdev_readpages+0x3c/0x50 [ 171.578831] read_pages+0x70/0x180 [ 171.582364] __do_page_cache_readahead+0x1cc/0x200 [ 171.585843] ondemand_readahead+0x148/0x310 [ 171.590613] page_cache_async_readahead+0xc0/0x100 [ 171.594719] generic_file_read_iter+0x54c/0x860 [ 171.599565] blkdev_read_iter+0x50/0x80 [ 171.603998] __vfs_read+0x134/0x190 [ 171.607800] vfs_read+0x94/0x130 [ 171.611273] ksys_read+0x6c/0xe0 [ 171.614745] __arm64_sys_read+0x24/0x30 [ 171.617974] el0_svc_handler+0xb8/0x140 [ 171.621509] el0_svc+0x8/0xc For the record, I'll restate the problem: dd hangs when reading a partition larger than RAM, except when using iflag=direct or iflag=nocache # dd if=/dev/sde of=/dev/null bs=64M iflag=direct 64+0 records in 64+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 51.1532 s, 84.0 MB/s # dd if=/dev/sde of=/dev/null bs=64M iflag=nocache 64+0 records in 64+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 60.6478 s, 70.8 MB/s # dd if=/dev/sde of=/dev/null bs=64M count=56 56+0 records in 56+0 records out 3758096384 bytes (3.8 GB, 3.5 GiB) copied, 50.5897 s, 74.3 MB/s # dd if=/dev/sde of=/dev/null bs=64M /*** CONSOLE LOCKS UP ***/ I've been looking at the differences between iflag=direct and no-flag. Using the following script to enable relevant(?) logs: mount -t debugfs nodev /sys/kernel/debug/ cd /sys/kernel/debug/tracing/events echo 1 > filemap/enable echo 1 > pagemap/enable echo 1 > vmscan/enable echo 1 > kmem/mm_page_free/enable echo 1 > kmem/mm_page_free_batched/enable echo 1 > kmem/mm_page_alloc/enable echo 1 > kmem/mm_page_alloc_zone_locked/enable echo 1 > kmem/mm_page_pcpu_drain/enable echo 1 > kmem/mm_page_alloc_extfrag/enable echo 1 > kmem/kmalloc_node/enable echo 1 > kmem/kmem_cache_alloc_node/enable echo 1 > kmem/kmem_cache_alloc/enable echo 1 > kmem/kmem_cache_free/enable # dd if=/dev/sde of=/dev/null bs=64M count=1 iflag=direct https://pastebin.ubuntu.com/p/YWp4pydM6V/ (114942 lines) # dd if=/dev/sde of=/dev/null bs=64M count=1 https://pastebin.ubuntu.com/p/xpzgN5H3Hp/ (247439 lines) Does anyone see what's going sideways in the no-flag case? Regards.