On Wed, Dec 05, 2018 at 10:34:02AM -0500, Josef Bacik wrote: > v1->v2: > - dropped my python library, TIL about jq. > - fixed the spelling mistakes in the test. > > -- Original message -- > > This patchset is to add a test to verify io.latency is working properly, and to > add all the supporting code to run that test. > > First is the cgroup2 infrastructure which is fairly straightforward. Just > verifies we have cgroup2, and gives us the helpers to check and make sure we > have the right controllers in place for the test. > > The second patch brings over some python scripts I put in xfstests for parsing > the fio json output. I looked at the existing fio performance stuff in > blktests, but we only capture bw stuff, which is wonky with this particular test > because once the fast group is finished the slow group is allowed to go as fast > as it wants. So I needed this to pull out actual jobtime spent. This will give > us flexibility to pull out other fio performance data in the future. > > The final patch is the test itself. It simply runs a job by itself to get a > baseline view of the disk performance. Then it creates 2 cgroups, one fast and > one slow, and runs the same job simultaneously in both groups. The result > should be that the fast group takes just slightly longer time than the baseline > (I use a 15% threshold to be safe), and that the slow one takes considerably > longer. Thanks, I cleaned up a ton of shellcheck warnings (from `make check`) and pushed to https://github.com/osandov/blktests/tree/josef. On I tested with QEMU on Jens' for-next branch. With an emulated NVMe device, it failed with "Too much of a performance drop for the protected workload". On virtio-blk, I hit this: [ 1843.056452] INFO: task fio:20750 blocked for more than 120 seconds. [ 1843.057495] Not tainted 4.20.0-rc5-00251-g90efb26fa9a4 #19 [ 1843.058487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1843.059769] fio D 0 20750 20747 0x00000080 [ 1843.060688] Call Trace: [ 1843.061123] ? __schedule+0x286/0x870 [ 1843.061735] ? blkcg_iolatency_done_bio+0x680/0x680 [ 1843.062574] ? blkcg_iolatency_cleanup+0x60/0x60 [ 1843.063347] schedule+0x32/0x80 [ 1843.063874] io_schedule+0x12/0x40 [ 1843.064449] rq_qos_wait+0x9a/0x120 [ 1843.065007] ? karma_partition+0x210/0x210 [ 1843.065661] ? blkcg_iolatency_done_bio+0x680/0x680 [ 1843.066435] blkcg_iolatency_throttle+0x185/0x360 [ 1843.067196] __rq_qos_throttle+0x23/0x30 [ 1843.067958] blk_mq_make_request+0x101/0x5c0 [ 1843.068637] generic_make_request+0x1b3/0x3c0 [ 1843.069329] submit_bio+0x45/0x140 [ 1843.069876] blkdev_direct_IO+0x3db/0x440 [ 1843.070527] ? aio_complete+0x2f0/0x2f0 [ 1843.071146] generic_file_direct_write+0x96/0x160 [ 1843.071880] __generic_file_write_iter+0xb3/0x1c0 [ 1843.072599] ? blk_mq_dispatch_rq_list+0x3aa/0x550 [ 1843.073340] blkdev_write_iter+0xa0/0x120 [ 1843.073960] ? __fget+0x6e/0xa0 [ 1843.074452] aio_write+0x11f/0x1d0 [ 1843.074979] ? __blk_mq_run_hw_queue+0x6f/0xe0 [ 1843.075658] ? __check_object_size+0xa0/0x189 [ 1843.076345] ? preempt_count_add+0x5a/0xb0 [ 1843.077086] ? aio_read_events+0x259/0x380 [ 1843.077819] ? kmem_cache_alloc+0x16e/0x1c0 [ 1843.078427] io_submit_one+0x4a8/0x790 [ 1843.078975] ? read_events+0x76/0x150 [ 1843.079510] __se_sys_io_submit+0x98/0x1a0 [ 1843.080116] ? syscall_trace_enter+0x1d3/0x2d0 [ 1843.080785] do_syscall_64+0x55/0x160 [ 1843.081404] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1843.082210] RIP: 0033:0x7f6e571fc4ed [ 1843.082763] Code: Bad RIP value. [ 1843.083268] RSP: 002b:00007ffc212b76f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 [ 1843.084445] RAX: ffffffffffffffda RBX: 00007f6e4c876870 RCX: 00007f6e571fc4ed [ 1843.085545] RDX: 0000557c4bc11208 RSI: 0000000000000001 RDI: 00007f6e4c85e000 [ 1843.086251] RBP: 00007f6e4c85e000 R08: 0000557c4bc2b130 R09: 00000000000002f8 [ 1843.087308] R10: 0000557c4bbf4470 R11: 0000000000000246 R12: 0000000000000001 [ 1843.088310] R13: 0000000000000000 R14: 0000557c4bc11208 R15: 00007f6e2b17f070