On Mon, 02 Jul 2012 03:58:57 +0100 Kerin Millar <kerframil@xxxxxxxxx> wrote: > Hi Neil, > > On 02/07/2012 03:52, NeilBrown wrote: > > On Mon, 02 Jul 2012 03:34:16 +0100 Kerin Millar<kerframil@xxxxxxxxx> wrote: > > > >> > Hello, > >> > > >> > I'm running a 4-way RAID-10 array with the f2 layout scheme on a 3.5-rc5 > > I thought I fixed this in 3.5-rc2. > > Maybe there is another bug.... > > > > Could you please double check that you are running a kernel with > > > > commit aba336bd1d46d6b0404b06f6915ed76150739057 > > Author: NeilBrown<neilb@xxxxxxx> > > Date: Thu May 31 15:39:11 2012 +1000 > > > > md: raid1/raid10: fix problem with merge_bvec_fn > > > > in it? > > I am indeed. I searched the list beforehand and noticed the patch in > question. Not sure which -rc it landed in but I checked my source tree > and it's definitely in there. > > Cheers, > > --Kerin Thanks. Looking at it again I see that it is definitely a different bug, that patch wouldn't affect it. But I cannot see what could possibly be causing the problem. You have a 256K chunk size, so requests should be limited to 512 sectors aligned at a 512-sector boundary. However all the requests that a causing errors are 512 sectors long, but aligned on a 256-sector boundary (which is not also 512-sector). This is wrong. It could be that btrfs is submitting bad requests, but I think it always uses bio_add_page, and bio_add_page appears to do the right thing. It could be that dm-linear is causing problem, but it seems to correctly after the underlying device for alignment, and reports that alignment to bio_add_page. It could be that md/raid10 is the problem but I cannot find any fault in raid10_mergeable_bvec - performs much the same tests that the raid01 make_request function does. So it is a mystery. Is this failure repeatable? If so, could you please insert WARN_ON_ONCE(1); in drivers/md/raid10.c where it prints out the message: just after the "bad_map:" label. Also, in raid10_mergeable_bvec, insert WARN_ON_ONCE(max < 0); just before if (max < 0) /* bio_add cannot handle a negative return */ max = 0; and then see if either of those generate a warning, and post the full stack trace if they do. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature