Re: [PATCH blktests v2 0/3] Test different queue counts

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Wed, 29 Mar 2023 00:57:55 +0000

On Mar 28, 2023 / 18:20, Chaitanya Kulkarni wrote:
> On 3/28/23 01:45, Shinichiro Kawasaki wrote:
> > On Mar 27, 2023 / 17:41, Daniel Wagner wrote:
> >> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote:
> >>> On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
> >>>> Setup different queues, e.g. read and poll queues.
> >>>>
> >>>> There is still the problem that _require_nvme_trtype_is_fabrics also includes
> >>>> the loop transport which has no support for different queue types.
> >>>>
> >>>> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@xxxxxxxx/
> >>> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
> >>> valuable.
> >>>
> >>> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
> >>> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
> >>> confirmed the hang disappears. I would like to wait for the kernel fix patch
> >>> delivered to upstream, before adding this test case to blktests master.
> >> Okay makes sense.
> >>
> >>> When I ran the test case without setting nvme_trtype, kernel reported messages
> >>> below:
> >>>
> >>> [  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> >>> [  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> >>> [  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'
> >> BTW, I've added a '|| echo FAIL' to catch those.
> >>
> >>> Is it useful to run the test case with default nvme_trtype=loop?
> >> No, we should run this test only for those transport which actually support the
> >> different queue types. Christoph suggest to figure out before running the test
> >> if it is actually supported. So my first idea was to check what options are
> >> supported by reading /dev/nvme-fabrics. But this will return all options we are
> >> parsed by fabrics.c but not the subset which each transport might only support.
> >>
> >> So to figure this out we would need to do a full setup just to figure out if it
> >> is supported. I think the currently best approach would just to limit this test
> >> to tcp and rdma. Maybe we could add something like
> >>
> >> rc:
> >> _require_nvme_trtype() {
> >> 	local trtype
> >> 	for trtype in "$@"; do
> >> 		if [[ "${nvme_trtype}" == "$trtype" ]]; then
> >> 			return 0
> >> 		fi
> >> 	done
> >> 	SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test")
> >> 	return 1
> >> }
> >>
> >> 047:
> >> requires() {
> >> 	_nvme_requires
> >> 	_have_xfs
> >> 	_have_fio
> >> 	_require_nvme_trtype tcp rdma
> >> 	_have_kver 4 21
> >> }
> >>
> >> What do you think?
> > Thanks for the clarifications about the requirements. I think your approach will
> > work. Having said that, we may have another potentially simpler solution:
> >
> > - Do not check _require_nvme_trtype and _have_kver in requires().
> > - After setting nr_write_queues in test(), check if dmesg contains the error
> >    "invalid parameter 'nr_write_queues" using the helper function
> >    _dmesg_since_test_start().
> > - If the error is reported, set SKIP_REASONS and return from test().
> >    Blktests will report the test case as "not run".
> >
> > This approach assumes that the "invalid parameter" is printed when the test case
> > should be skipped (loop transport, older kernel version).
> 
> 
> Is it possible to not rely on dmesg unless it is absolutely required ?
> 
> > As a generic guide, SKIP_REASONS should be set in requires() before test().
> > However, if the SKIP_REASONS can not be checked before test(), blktests allows
> > to set it in test(). The test case block/030 is such an exception. I think your
> > new test case can be another exception. With this, we do not need to repeat the
> > full setup. And it might be more robust against future changes such as new
> > transport types.
> 
> Ummm should we avoid creating exceptions ? unless it is absolutely 
> necessary ?
> The problem with exception is it becomes problematic for long term 
> maintenance.
> 
> I believe currently focusing on tcp/rdma only is sufficient ...

I see, then let's go with Daniel's approach. My idea could be tricky too much.

-- 
Shin'ichiro Kawasaki