Re: [blktests] zbd/012: Test requeuing of zoned writes and queue freezing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/26/24 06:10, Bart Van Assche wrote:
> Test concurrent requeuing of zoned writes and request queue freezing. While
> test test passes with kernel 6.9, it triggers a hang with kernels 6.10 and
> later. This shows that this hang is a regression introduced by the zone
> write plugging code.
> 
> sysrq: Show Blocked State
> task:(udev-worker)   state:D stack:0     pid:75392 tgid:75392 ppid:2178   flags:0x00000006
> Call Trace:
>  <TASK>
>  __schedule+0x3e8/0x1410
>  schedule+0x27/0xf0
>  blk_mq_freeze_queue_wait+0x6f/0xa0
>  queue_attr_store+0x60/0xc0
>  kernfs_fop_write_iter+0x13e/0x1f0
>  vfs_write+0x25b/0x420
>  ksys_write+0x65/0xe0
>  do_syscall_64+0x82/0x160
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
>  tests/zbd/012     | 70 +++++++++++++++++++++++++++++++++++++++++++++++
>  tests/zbd/012.out |  2 ++
>  2 files changed, 72 insertions(+)
>  create mode 100644 tests/zbd/012
>  create mode 100644 tests/zbd/012.out
> 
> diff --git a/tests/zbd/012 b/tests/zbd/012
> new file mode 100644
> index 000000000000..0551d01011af
> --- /dev/null
> +++ b/tests/zbd/012
> @@ -0,0 +1,70 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 Google LLC
> +
> +. tests/scsi/rc
> +. common/scsi_debug
> +
> +DESCRIPTION="test requeuing of zoned writes and queue freezing"
> +QUICK=1
> +
> +requires() {
> +	_have_fio_zbd_zonemode
> +}
> +
> +toggle_iosched() {
> +	while true; do
> +		for iosched in none mq-deadline; do
> +			echo "${iosched}" > "/sys/class/block/$(basename "$zdev")/queue/scheduler"
> +			sleep .1
> +		done
> +	done
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +
> +	local qd=1
> +	local scsi_debug_params=(
> +		delay=0
> +		dev_size_mb=1024
> +		every_nth=$((2 * qd))
> +		max_queue="${qd}"
> +		opts=0x8000          # SDEBUG_OPT_HOST_BUSY
> +		sector_size=4096
> +		statistics=1
> +		zbc=host-managed
> +		zone_nr_conv=0
> +		zone_size_mb=4
> +	)
> +	_init_scsi_debug "${scsi_debug_params[@]}" &&
> +	local zdev="/dev/${SCSI_DEBUG_DEVICES[0]}" fail &&
> +	ls -ld "${zdev}" >>"${FULL}" &&
> +	{ toggle_iosched & } &&
> +	toggle_iosched_pid=$! &&
> +	local fio_args=(
> +		--direct=1
> +		--filename="${zdev}"
> +		--iodepth="${qd}"
> +		--ioengine=io_uring

Something very odd is going on here: this fio run is supposed to be qd=1 but
when I get the hang, I see plugged BIOs (at least always 1, and very often more
than 1 in different zones). But with a QD=1 workload, this should *NEVER*
happen. No write command should ever enter a zone write plug (as long as the
write BIO does not get split). So it looks to me like fio, or io_uring
submission context, is sending weird writes...

If I change --ioengine=io_uring to --ioengine=libaio, then the test passes, always.

I am not sure what is going on.

But I do think there is a potential deadlock anyway: if a write command fails,
zone write plug error recovery will be triggered. If that zone write plug has
BIOs plugged and the call to queue freeze when changing the scheduler happens
before the report zone command is issued for the zone write plug error recovery,
then we will deadlock on the queue freeze.
So this is a definitive bug, even though this test does not create this
situation. As already discussed, given the need to properly recover errors for
emulated zone append commands, I am not sure how to fix this yet...

If you can also look into why io_uring IO engine does not respect the iodepth=1
argument, that would be great.

Note that I tried qd > 1 with libaio and everything works just fine: the test
passes. Things are strange only with io_uring.

> +		--ioscheduler=none
> +		--name=pipeline-zoned-writes
> +		--output="${RESULTS_DIR}/fio-output-zbd-092.txt"
> +		--runtime="${TIMEOUT:-30}"
> +		--rw=randwrite
> +		--time_based
> +		--zonemode=zbd
> +	) &&
> +	_run_fio "${fio_args[@]}" >>"${FULL}" 2>&1 ||
> +	fail=true
> +
> +	kill "${toggle_iosched_pid}" 2>&1
> +	_exit_scsi_debug
> +
> +	if [ -z "$fail" ]; then
> +		echo "Test complete"
> +	else
> +		echo "Test failed"
> +		return 1
> +	fi
> +}
> diff --git a/tests/zbd/012.out b/tests/zbd/012.out
> new file mode 100644
> index 000000000000..8ff654950c5f
> --- /dev/null
> +++ b/tests/zbd/012.out
> @@ -0,0 +1,2 @@
> +Running zbd/012
> +Test complete


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux