Re: [PATCH] md/bitmap: avoid read out of the disk

NeilBrown <neilb@xxxxxxxx> · Fri, 13 Oct 2017 08:46:35 +1100

On Thu, Oct 12 2017, Song Liu wrote:

>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
>> 
>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>>> On Tue, Oct 10 2017, Shaohua Li wrote:
>>> 
>>>> From: Shaohua Li <shli@xxxxxx>
>>>> 
>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>>> the read size to the end of disk. Write path already has similar limitation.
>>>> 
>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>>> Reported-by: Joshua Kinard <kumba@xxxxxxxxxx>
>>>> Tested-by: Joshua Kinard <kumba@xxxxxxxxxx>
>>>> Cc: Song Liu <songliubraving@xxxxxx>
>>>> Signed-off-by: Shaohua Li <shli@xxxxxx>
>>> 
>>> Given that this bug was introduced by
>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>>> 
>>> and that patch is markted:
>>> 
>>>    Cc: stable@xxxxxxxxxxxxxxx (4.10+)
>>> 
>>> I think this patch should be tagged "CC: stable" too.
>> 
>> I thought the Fix tag is enough, but I'll add the stable 
>>> However ... that earlier patch looks strange to me.
>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>>> superblock too early?
>>> I think that would better than breaking code that previously worked.
>> 
>> That's the log reply code, which must update superblock and hence bitmap
>> superblock, because reply happens very earlier. I agree the reply might still
>> have problem with bitmap. We'd better defer reply after the raid is fully
>> initialized. Song, any idea?
>> 
>
> With write back cache, there are two different types of stripes in recovery:
> data-parity, and data-only. For data-parity stripes, we can simply replay data
> from the journal. But for data-only stripes, we need to do rcw or rmw to update
> parities. Currently, the writes are handled with raid5 state machine. Therefore,
> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> stripes before we fully initialize the array, because these stripes need to be 
> handled with write back state machine; while we we always start the array with 
> write through journal_mode. 
>
> Maybe we can fix this by change the order of initialization in md_run(), 
> specifically, moving bitmap_create() before pers->run(). 

I was thinking exactly this as I was looking at the code.  ->run can
start recovery happening, and having that happen before the bitmap is
ready is wrong.
Maybe there is enough locking that things won't happen in the wrong
order, but it does look a bit odd.
Thanks for the explanation about the interaction between the journal and
the bitmaps.  I'll dig into the code and see what I find.  Then maybe we
can compare notes.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature