On Jul 18, 2024 / 11:26, Ofir Gal wrote: > On 7/17/24 10:41, Shinichiro Kawasaki wrote: > > Hi Ofir, thank you for this v3 series. The two patches look good to me, except > > one unclear point below. > > > > On Jul 16, 2024 / 14:50, Ofir Gal wrote: > > [...] > >> diff --git a/tests/md/001 b/tests/md/001 > >> new file mode 100755 > >> index 0000000..e9578e8 > >> --- /dev/null > >> +++ b/tests/md/001 > >> @@ -0,0 +1,85 @@ > >> +#!/bin/bash > >> +# SPDX-License-Identifier: GPL-3.0+ > >> +# Copyright (C) 2024 Ofir Gal > >> +# > >> +# The bug is "visible" only when the underlying device of the raid is a network > >> +# block device that utilize MSG_SPLICE_PAGES. nvme-tcp is used as the network device. > >> +# > >> +# Regression test for patch "md/md-bitmap: fix writing non bitmap pages" and > >> +# for patch "nvme-tcp: use sendpages_ok() instead of sendpage_ok()" > > > > The cover letter of the series says that the new test case is the regression > > test for the patch "md/md-bitmap: fix writing non bitmap pages". On the other > > hand, the comment above says that this test case is for the two patches. Which > > is correct? (Or which is more accurate?) > > > > When I ran this test case, it failed on the kernel v6.10, which is expected. > > Then I applied the 1st patch "md/md-bitmap: fix writing non bitmap pages" only > > to the v6.10 kernel, and the test case passed. It passed without the 2nd patch > > "nvme-tcp: use sendpages_ok() instead of sendpage_ok()". So, I'm not sure if > > this test case is the regression test for the 2nd patch. > Sorry for the confusion, either one of the patches solves the issue. > > The "md/md-bitmap: fix writing non bitmap pages" patch fix the root > cause issue. md-bitmap sent contiguous pages that it didn't allocate, > that happened to be a page list that consists of non-slub and slub > pages. > > The nvme-tcp assumed there won't be a mixed IO of slub and > non-slub in order to turn on MSG_SPLICE_PAGES, "nvme-tcp: use > sendpages_ok() instead of sendpage_ok()" patch address this assumption > by checking each page instead of the first page. > > I think it's more accurate to say this regression test for the 1st > patch, we can probably make a separate regression test for the 2nd > patch. > > I can change the comment to prevent confusion. Ofir, thank you for the clarification. I tried to apply only the 2nd patch "nvme-tcp: use sendpages_ok() instead of sendpage_ok()" and its depenednet patch to the kernel. When I ran the added test case on this kernel, I observed the failure symptom changed (it looks improving the kernel behavior), but still I observed KASAN slab-use-after-free and the test case failed. I think this explains your statement: "it's more accurate to say this regression test for the 1st patch". If you are okay just to drop the 2nd patch part in the comment, I can do that edit when I apply this blktests patch. Or if you want to improve the comment by yourself, please repost the series. Please let me know your preference.