Re: [PATCH blktests v3 2/2] md: add regression test for "md/md-bitmap: fix writing non bitmap pages"

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Thu, 18 Jul 2024 12:36:35 +0000

On Jul 18, 2024 / 11:26, Ofir Gal wrote:
> On 7/17/24 10:41, Shinichiro Kawasaki wrote:
> > Hi Ofir, thank you for this v3 series. The two patches look good to me, except
> > one unclear point below.
> >
> > On Jul 16, 2024 / 14:50, Ofir Gal wrote:
> > [...]
> >> diff --git a/tests/md/001 b/tests/md/001
> >> new file mode 100755
> >> index 0000000..e9578e8
> >> --- /dev/null
> >> +++ b/tests/md/001
> >> @@ -0,0 +1,85 @@
> >> +#!/bin/bash
> >> +# SPDX-License-Identifier: GPL-3.0+
> >> +# Copyright (C) 2024 Ofir Gal
> >> +#
> >> +# The bug is "visible" only when the underlying device of the raid is a network
> >> +# block device that utilize MSG_SPLICE_PAGES. nvme-tcp is used as the network device.
> >> +#
> >> +# Regression test for patch "md/md-bitmap: fix writing non bitmap pages" and
> >> +# for patch "nvme-tcp: use sendpages_ok() instead of sendpage_ok()"
> >
> > The cover letter of the series says that the new test case is the regression
> > test for the patch "md/md-bitmap: fix writing non bitmap pages". On the other
> > hand, the comment above says that this test case is for the two patches. Which
> > is correct? (Or which is more accurate?)
> >
> > When I ran this test case, it failed on the kernel v6.10, which is expected.
> > Then I applied the 1st patch "md/md-bitmap: fix writing non bitmap pages" only
> > to the v6.10 kernel, and the test case passed. It passed without the 2nd patch
> > "nvme-tcp: use sendpages_ok() instead of sendpage_ok()". So, I'm not sure if
> > this test case is the regression test for the 2nd patch.
> Sorry for the confusion, either one of the patches solves the issue.
> 
> The "md/md-bitmap: fix writing non bitmap pages" patch fix the root
> cause issue. md-bitmap sent contiguous pages that it didn't allocate,
> that happened to be a page list that consists of non-slub and slub
> pages.
> 
> The nvme-tcp assumed there won't be a mixed IO of slub and
> non-slub in order to turn on MSG_SPLICE_PAGES, "nvme-tcp: use
> sendpages_ok() instead of sendpage_ok()" patch address this assumption
> by checking each page instead of the first page.
> 
> I think it's more accurate to say this regression test for the 1st
> patch, we can probably make a separate regression test for the 2nd
> patch.
> 
> I can change the comment to prevent confusion.

Ofir, thank you for the clarification. I tried to apply only the 2nd patch
"nvme-tcp: use sendpages_ok() instead of sendpage_ok()" and its depenednet
patch to the kernel. When I ran the added test case on this kernel, I
observed the failure symptom changed (it looks improving the kernel
behavior), but still I observed KASAN slab-use-after-free and the test case
failed. I think this explains your statement: "it's more accurate to say this
regression test for the 1st patch".

If you are okay just to drop the 2nd patch part in the comment, I can do that
edit when I apply this blktests patch. Or if you want to improve the comment
by yourself, please repost the series. Please let me know your preference.