Re: RAID1 repair GPF crash w/3.10-rc7

Joe Lawrence <joe.lawrence@xxxxxxxxxxx> · Wed, 3 Jul 2013 17:49:51 -0400 (EDT)

On Mon, 1 Jul 2013, Joe Lawrence wrote:

> Hi Kent & Neil,
> 
> I've hit a crash in MD during RAID1 repair while running 3.10-rc7:
>
> [ ... snip ... ]

Hi Neil,

Looking through the MD source, I'm trying to understand part of the
RAID1 repair path.  I came up with a few questions:

1 - During user initiated RAID1 repair, is the loop at the bottom of
sync_request(), under the bio_full label, responsible for submitting all
of the initial read bios?

2 - Does process_checks() later find the first uptodate read bio and
copy its data into the other r1_bio->bios[] for write repair to the
other disks?

If both are true, then perhaps the following applies to this crash...

Comments in commit f79ea416 "block: Refactor blk_update_request()" msg
include:

    Note that req_bio_endio() now always calls bio_advance() - which
    means it always loops over the biovec, not just on partial
    completions.  Don't expect it to affect performance, but worth
    noting.

Now that process_checks() has been further modified for immutable bio
prep (commit d3b45c2 "raid1: use bio_copy_data()"), it calls
bio_copy_data() to fill in the write repair bios... which starts
indexing the bi_bio_vec[] from wherever bi_idx happens to be.

If this is indeed the case, I'm having trouble coming up with a good
solution:

  - Immutable bios means drivers don't touch bi_idx.  So MD shouldn't
    "re-wind" the source bi_idx before calling bio_copy_data().

  - bio_copy_data() could copy the entire source bi_bio_vec[], as MD had
    done in the past, but that is that safe?  (ie, can we map bio
    vectors once they have been iterated over?)

Thanks,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html