Hi folks, First line: This bug only influences md raid0 device which applies all the following conditions, 1) Assembled by component disks with different sizes. 2) Created and used under Linux kernel before (including) Linux v3.12, then upgrade to Linux kernel after (including) Linux v3.13. 3) New data are written to md raid0 in new kernel >= Linux v3.13. Then the md raid0 may have inconsistent sector mapping and experience data corruption. Recently I receive a bug report that customer encounter file system corruption after upgrading their kernel from Linux 3.12 to 4.4. It turns out to be the underlying md raid0 corruption after the kernel upgrade. I find it is because a sector map bug in md raid0 code include and before Linux v3.12. Here is the buggy code piece I copied from stable Linux v3.12.74 drivers/md/raid0.c:raid0_make_request(), 547 sector_offset = bio->bi_sector; 548 zone = find_zone(mddev->private, §or_offset); 549 tmp_dev = map_sector(mddev, zone, bio->bi_sector, 550 §or_offset); At line 548 after find_zone() returns, sector_offset is updated to be an offset inside current zone. Then at line 549 the third parameter of calling map_sector() should be the updated sector_offset, but bio->bi_sector (original LBA or md raid0 device) is used. If the raid0 device has *multiple zones*, except the first zone, the mapping <dev, sector> pair returned by map_sector() for all rested zones are unexpected and wrong. The buggy code was introduced since Linux v2.6.31 in commit fbb704efb784 ("md: raid0 :Enables chunk size other than powers of 2."), unfortunate the mistaken mapping calculation has stable and unique result too, so it works without obvious problem until commit 20d0189b1012 ("block: Introduce new bio_split()") merged into Linux v3.13. This patch fixed the mistaken mapping in the following lines of change, 654 - sector_offset = bio->bi_iter.bi_sector; 655 - zone = find_zone(mddev->private, §or_offset); 656 - tmp_dev = map_sector(mddev, zone, bio->bi_iter.bi_sector, 657 - §or_offset); 694 + zone = find_zone(mddev->private, §or); 695 + tmp_dev = map_sector(mddev, zone, sector, §or); At line 695 of this patch, the third parameter of calling map_sector() is fixed to 'sector', this is the correct value which contains the sector offset inside the corresponding zone. The this patch implicitly *changes* md raid0 on-disk layout. If a md raid0 has component disks with *different* sizes, then it will contain multiple zones. If such multiple zones raid0 device is created before Linux v3.13, all data chunks after first zone will be mapped to different location in kernel after (including) Linux v3.13. The result is, data written in the LBA after first zone will be treated as corruption. A worse case is, if the md raid0 has data chunks filled in first md raid0 zone in Linux v3.12 (or earlier kernels), then update to Linux v3.13 (or later kernels) and fill more data chunks in second and rested zone. Then in neither Linux v3.12 no Linux v3.13, there is always partial data corrupted. Currently there is no way to tell whether a md raid0 device is mapped in wrong calculation in kernel before (including) Linux v3.12 or in correct calculation in kernels after (including) Linux v3.13. If a md raid0 device (contains multiple zones) created and used crossing these kernel version, there is possibility and different mapping calculation generation different/inconsistent on-disk layout in different md raid0 zones, and results data corruption. For our enterprise Linux products we can handle it properly for a few product number of kernels. But for upstream and stable kernels, I don't have idea how to fix this ugly problem in a generic way. Neil Brown discussed with me offline, he proposed a temporary workaround that only permit to assemble md raid0 device with identical component disk size, and reject to assemble md raid0 device with component disks with different sizes. We can stop this workaround when there is a proper way to fix the problem. I suggest our developer community to work together for a solution, this is the motivation I post this email for your comments. Thanks in advance. -- Coly Li