On 03/07/2012 03:47, NeilBrown wrote: [snip]
Thanks. Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-) The symptom is that iozone on btrfs on md/raid10 can result in [ 919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256 [ 919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap two chunks - the last half of one chunk and the first half of the next. That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should prevent it. However btrfs_map_bio() sets ->bi_sector to a new value without verifying that the resulting bio is still acceptable - which it isn't. The core problem is that you cannot build a bio for one location, then use it freely at another location. md/raid1 handles this by checking each addition to a bio against all the possible location that it might read/write it. Maybe btrfs could do the same. Alternately we could work with Kent Overstreet (of bcache fame) to remove the restriction that the fs must make the bio compatible with the device - instead requiring the device to split bios when needed, and making it easy to do that (currently it is not easy). And there are probably other alternative.
Thanks very much for identifying the bug. I'm glad to find that the raid subsystem is not at fault. I'll give btrfs a spin at some point in the future and see whether anything has changed by then.
Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html