On Fri, Nov 04, 2016 at 11:59:17PM +0500, Roman Mamedov wrote: > On Fri, 4 Nov 2016 11:50:40 -0700 > Marc MERLIN <marc@xxxxxxxxxxx> wrote: > > > I can switch to GiB if you'd like, same thing: > > myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 > > dd: reading `/dev/md5': Invalid argument > > 2+0 records in > > 2+0 records out > > 2147483648 bytes (2.1 GB) copied, 21.9751 s, 97.7 MB/s > > But now you cansee the cutoff point is exactly at 8192 -- a strangely familiar > number, much more so than "8.8 TB", right? :D Yes, that's a valid point :) > Could you recheck (and post) your mdadm --detail /dev/md5, if the whole array > didn't get cut to a half of its size in "Array Size". I just posted it in my previous Email: myth:~# mdadm --query --detail /dev/md5 /dev/md5: Version : 1.2 Creation Time : Tue Jan 21 10:35:52 2014 Raid Level : raid5 Array Size : 15627542528 (14903.59 GiB 16002.60 GB) Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Oct 31 07:56:07 2016 State : clean (more in the previous Email) > Or maybe the remove bad block list code has some overflow bug which cuts each > device size to 2048 GiB, without the array size reflecting that. You run RAID5 > of five members, (5-1)*2048 would give you exactly 8192 GiB. that's very possible too. So even though the array is marked clean and I don't care if some md blocks return data that is actually corrupt as long as the read succeeds (my filesystem will sort that out), I figured I could try a repair. What's interesting is that it started exactly at 50%, which is also likely where my reads were failing. myth:/sys/block/md5/md# echo repair > sync_action md5 : active raid5 sdg1[0] sdd1[5] sde1[3] sdf1[2] sdh1[6] 15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] [==========>..........] resync = 50.0% (1953925916/3906885632) finish=1899.1min speed=17138K/sec bitmap: 0/30 pages [0KB], 65536KB chunk That said, as this resync is processing, I'd think/hope it would move the error forward, but it does not seem to: myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 dd: reading `/dev/md5': Invalid argument 2+0 records in 2+0 records out 2147483648 bytes (2.1 GB) copied, 27.8491 s, 77.1 MB/s So basically I'm stuck in the same place, and it seems that I've found an actual swraid bug in the kernel and I'm not hopeful that the problem will be fixed after the resync completes. If someone wants me to try stuff before I wipe it all and restart, let me know, but otherwise I've been in this broken state for 3 weeks now and I need to fix it so that I can restart my backups again. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html