Re: blktests with zbd/006 ZNS triggers a possible false positive RCU stall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 20, 2022 at 05:54:29AM +0000, Shinichiro Kawasaki wrote:
> On Apr 14, 2022 / 15:02, Luis Chamberlain wrote:
> > Hey folks,
> > 
> > While enhancing kdevops [0] to embrace automation of testing with
> > blktests for ZNS I ended up spotting a possible false positive RCU stall
> > when running zbd/006 after zbd/005. The curious thing though is that
> > this possible RCU stall is only possible when using the qemu
> > ZNS drive, not when using nbd. In so far as kdevops is concerned
> > it creates ZNS drives for you when you enable the config option
> > CONFIG_QEMU_ENABLE_NVME_ZNS=y. So picking any of the ZNS drives
> > suffices. When configuring blktests you can just enable the zbd
> > guest, so only a pair of guests are reated the zbd guest and the
> > respective development guest, zbd-dev guest. When using
> > CONFIG_KDEVOPS_HOSTS_PREFIX="linux517" this means you end up with
> > just two guests:
> > 
> >   * linux517-blktests-zbd
> >   * linux517-blktests-zbd-dev
> > 
> > The RCU stall can be triggered easily as follows:
> > 
> > make menuconfig # make sure to enable CONFIG_QEMU_ENABLE_NVME_ZNS=y and blktests
> > make
> > make bringup # bring up guests
> > make linux # build and boot into v5.17-rc7
> > make blktests # build and install blktests
> > 
> > Now let's ssh to the guest while leaving a console attached
> > with `sudo virsh vagrant_linux517-blktests-zbd` in a window:
> > 
> > ssh linux517-blktests-zbd
> > sudo su -
> > cd /usr/local/blktests
> > export TEST_DEVS=/dev/nvme9n1
> > i=0; while true; do ./check zbd/005 zbd/006; if [[ $? -ne 0 ]]; then echo "BAD at $i"; break; else echo GOOOD $i ; fi; let i=$i+1; done;
> > 
> > The above should never fail, but you should eventually see an RCU
> > stall candidate on the console. The full details can be observed on the
> > gist [1] but for completeness I list some of it below. It may be a false
> > positive at this point, not sure.
> > 
> > [493272.711271] run blktests zbd/005 at 2022-04-14 20:03:22
> > [493305.769531] run blktests zbd/006 at 2022-04-14 20:03:55
> > [493336.979482] nvme nvme9: I/O 192 QID 5 timeout, aborting
> > [493336.981666] nvme nvme9: Abort status: 0x0
> > [493367.699440] nvme nvme9: I/O 192 QID 5 timeout, reset controller
> > [493388.819341] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> 
> Hello Luis,
> 
> I run blktests zbd group on several QEMU ZNS emulation devices for every rcX
> kernel releases. But, I have not ever observed the symptom above. Now I'm
> repeating zbd/005 and zbd/006 using v5.18-rc3 and a QEMU ZNS device, and do
> not observe the symptom so far, after 400 times repeat.

Did you try v5.17-rc7 ?

> I would like to run the test using same ZNS set up as yours. Can you share how
> your ZNS device is set up? I would like to know device size and QEMU -device
> options, such as zoned.zone_size or zoned.max_active.

It is as easy as the above make commands, and follow up login commands.
I'll be bumping the kernel to test for fstesta and blktests on kdevops
soon but the baseline is sadly not yet done for all filesystems and
blktests yet. Once the baseline is completed though it should be easy to
bump kernel and confirm if old failures are not failing anymore / find
new issues.

  Luis



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux