Re: blktests with zbd/006 ZNS triggers a possible false positive RCU stall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Apr 14, 2022 / 15:02, Luis Chamberlain wrote:
> Hey folks,
> 
> While enhancing kdevops [0] to embrace automation of testing with
> blktests for ZNS I ended up spotting a possible false positive RCU stall
> when running zbd/006 after zbd/005. The curious thing though is that
> this possible RCU stall is only possible when using the qemu
> ZNS drive, not when using nbd. In so far as kdevops is concerned
> it creates ZNS drives for you when you enable the config option
> CONFIG_QEMU_ENABLE_NVME_ZNS=y. So picking any of the ZNS drives
> suffices. When configuring blktests you can just enable the zbd
> guest, so only a pair of guests are reated the zbd guest and the
> respective development guest, zbd-dev guest. When using
> CONFIG_KDEVOPS_HOSTS_PREFIX="linux517" this means you end up with
> just two guests:
> 
>   * linux517-blktests-zbd
>   * linux517-blktests-zbd-dev
> 
> The RCU stall can be triggered easily as follows:
> 
> make menuconfig # make sure to enable CONFIG_QEMU_ENABLE_NVME_ZNS=y and blktests
> make
> make bringup # bring up guests
> make linux # build and boot into v5.17-rc7
> make blktests # build and install blktests
> 
> Now let's ssh to the guest while leaving a console attached
> with `sudo virsh vagrant_linux517-blktests-zbd` in a window:
> 
> ssh linux517-blktests-zbd
> sudo su -
> cd /usr/local/blktests
> export TEST_DEVS=/dev/nvme9n1
> i=0; while true; do ./check zbd/005 zbd/006; if [[ $? -ne 0 ]]; then echo "BAD at $i"; break; else echo GOOOD $i ; fi; let i=$i+1; done;
> 
> The above should never fail, but you should eventually see an RCU
> stall candidate on the console. The full details can be observed on the
> gist [1] but for completeness I list some of it below. It may be a false
> positive at this point, not sure.
> 
> [493272.711271] run blktests zbd/005 at 2022-04-14 20:03:22
> [493305.769531] run blktests zbd/006 at 2022-04-14 20:03:55
> [493336.979482] nvme nvme9: I/O 192 QID 5 timeout, aborting
> [493336.981666] nvme nvme9: Abort status: 0x0
> [493367.699440] nvme nvme9: I/O 192 QID 5 timeout, reset controller
> [493388.819341] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

Hello Luis,

I run blktests zbd group on several QEMU ZNS emulation devices for every rcX
kernel releases. But, I have not ever observed the symptom above. Now I'm
repeating zbd/005 and zbd/006 using v5.18-rc3 and a QEMU ZNS device, and do
not observe the symptom so far, after 400 times repeat.

I would like to run the test using same ZNS set up as yours. Can you share how
your ZNS device is set up? I would like to know device size and QEMU -device
options, such as zoned.zone_size or zoned.max_active.

-- 
Best Regards,
Shin'ichiro Kawasaki



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux