Re: [PATCH blktests v3 2/2] md: add regression test for "md/md-bitmap: fix writing non bitmap pages"

Ofir Gal <ofir.gal@xxxxxxxxxxx> · Thu, 18 Jul 2024 15:59:18 +0300



On 7/18/24 15:36, Shinichiro Kawasaki wrote:
> On Jul 18, 2024 / 11:26, Ofir Gal wrote:
>> On 7/17/24 10:41, Shinichiro Kawasaki wrote:
>>> Hi Ofir, thank you for this v3 series. The two patches look good to me, except
>>> one unclear point below.
>>>
>>> On Jul 16, 2024 / 14:50, Ofir Gal wrote:
>>> [...]
>>>> diff --git a/tests/md/001 b/tests/md/001
>>>> new file mode 100755
>>>> index 0000000..e9578e8
>>>> --- /dev/null
>>>> +++ b/tests/md/001
>>>> @@ -0,0 +1,85 @@
>>>> +#!/bin/bash
>>>> +# SPDX-License-Identifier: GPL-3.0+
>>>> +# Copyright (C) 2024 Ofir Gal
>>>> +#
>>>> +# The bug is "visible" only when the underlying device of the raid is a network
>>>> +# block device that utilize MSG_SPLICE_PAGES. nvme-tcp is used as the network device.
>>>> +#
>>>> +# Regression test for patch "md/md-bitmap: fix writing non bitmap pages" and
>>>> +# for patch "nvme-tcp: use sendpages_ok() instead of sendpage_ok()"
>>>
>>> The cover letter of the series says that the new test case is the regression
>>> test for the patch "md/md-bitmap: fix writing non bitmap pages". On the other
>>> hand, the comment above says that this test case is for the two patches. Which
>>> is correct? (Or which is more accurate?)
>>>
>>> When I ran this test case, it failed on the kernel v6.10, which is expected.
>>> Then I applied the 1st patch "md/md-bitmap: fix writing non bitmap pages" only
>>> to the v6.10 kernel, and the test case passed. It passed without the 2nd patch
>>> "nvme-tcp: use sendpages_ok() instead of sendpage_ok()". So, I'm not sure if
>>> this test case is the regression test for the 2nd patch.
>> Sorry for the confusion, either one of the patches solves the issue.
>>
>> The "md/md-bitmap: fix writing non bitmap pages" patch fix the root
>> cause issue. md-bitmap sent contiguous pages that it didn't allocate,
>> that happened to be a page list that consists of non-slub and slub
>> pages.
>>
>> The nvme-tcp assumed there won't be a mixed IO of slub and
>> non-slub in order to turn on MSG_SPLICE_PAGES, "nvme-tcp: use
>> sendpages_ok() instead of sendpage_ok()" patch address this assumption
>> by checking each page instead of the first page.
>>
>> I think it's more accurate to say this regression test for the 1st
>> patch, we can probably make a separate regression test for the 2nd
>> patch.
>>
>> I can change the comment to prevent confusion.
>
> Ofir, thank you for the clarification. I tried to apply only the 2nd patch
> "nvme-tcp: use sendpages_ok() instead of sendpage_ok()" and its depenednet
> patch to the kernel. When I ran the added test case on this kernel, I
> observed the failure symptom changed (it looks improving the kernel
> behavior), but still I observed KASAN slab-use-after-free and the test case
> failed. I think this explains your statement: "it's more accurate to say this
> regression test for the 1st patch".
>
> If you are okay just to drop the 2nd patch part in the comment, I can do that
> edit when I apply this blktests patch. Or if you want to improve the comment
> by yourself, please repost the series. Please let me know your preference.
You can drop the 2nd part, Thanks!