On 7/18/24 15:36, Shinichiro Kawasaki wrote: > On Jul 18, 2024 / 11:26, Ofir Gal wrote: >> On 7/17/24 10:41, Shinichiro Kawasaki wrote: >>> Hi Ofir, thank you for this v3 series. The two patches look good to me, except >>> one unclear point below. >>> >>> On Jul 16, 2024 / 14:50, Ofir Gal wrote: >>> [...] >>>> diff --git a/tests/md/001 b/tests/md/001 >>>> new file mode 100755 >>>> index 0000000..e9578e8 >>>> --- /dev/null >>>> +++ b/tests/md/001 >>>> @@ -0,0 +1,85 @@ >>>> +#!/bin/bash >>>> +# SPDX-License-Identifier: GPL-3.0+ >>>> +# Copyright (C) 2024 Ofir Gal >>>> +# >>>> +# The bug is "visible" only when the underlying device of the raid is a network >>>> +# block device that utilize MSG_SPLICE_PAGES. nvme-tcp is used as the network device. >>>> +# >>>> +# Regression test for patch "md/md-bitmap: fix writing non bitmap pages" and >>>> +# for patch "nvme-tcp: use sendpages_ok() instead of sendpage_ok()" >>> >>> The cover letter of the series says that the new test case is the regression >>> test for the patch "md/md-bitmap: fix writing non bitmap pages". On the other >>> hand, the comment above says that this test case is for the two patches. Which >>> is correct? (Or which is more accurate?) >>> >>> When I ran this test case, it failed on the kernel v6.10, which is expected. >>> Then I applied the 1st patch "md/md-bitmap: fix writing non bitmap pages" only >>> to the v6.10 kernel, and the test case passed. It passed without the 2nd patch >>> "nvme-tcp: use sendpages_ok() instead of sendpage_ok()". So, I'm not sure if >>> this test case is the regression test for the 2nd patch. >> Sorry for the confusion, either one of the patches solves the issue. >> >> The "md/md-bitmap: fix writing non bitmap pages" patch fix the root >> cause issue. md-bitmap sent contiguous pages that it didn't allocate, >> that happened to be a page list that consists of non-slub and slub >> pages. >> >> The nvme-tcp assumed there won't be a mixed IO of slub and >> non-slub in order to turn on MSG_SPLICE_PAGES, "nvme-tcp: use >> sendpages_ok() instead of sendpage_ok()" patch address this assumption >> by checking each page instead of the first page. >> >> I think it's more accurate to say this regression test for the 1st >> patch, we can probably make a separate regression test for the 2nd >> patch. >> >> I can change the comment to prevent confusion. > > Ofir, thank you for the clarification. I tried to apply only the 2nd patch > "nvme-tcp: use sendpages_ok() instead of sendpage_ok()" and its depenednet > patch to the kernel. When I ran the added test case on this kernel, I > observed the failure symptom changed (it looks improving the kernel > behavior), but still I observed KASAN slab-use-after-free and the test case > failed. I think this explains your statement: "it's more accurate to say this > regression test for the 1st patch". > > If you are okay just to drop the 2nd patch part in the comment, I can do that > edit when I apply this blktests patch. Or if you want to improve the comment > by yourself, please repost the series. Please let me know your preference. You can drop the 2nd part, Thanks!