Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?

NeilBrown <neilb@xxxxxxxx> · Mon, 07 Nov 2016 11:16:56 +1100

On Sat, Nov 05 2016, Marc MERLIN wrote:
>
> What's interesting is that it started exactly at 50%, which is also
> likely where my reads were failing.
>
> myth:/sys/block/md5/md# echo repair > sync_action 
>
> md5 : active raid5 sdg1[0] sdd1[5] sde1[3] sdf1[2] sdh1[6]
>       15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>       [==========>..........]  resync = 50.0% (1953925916/3906885632) finish=1899.1min speed=17138K/sec
>       bitmap: 0/30 pages [0KB], 65536KB chunk

Yep, that is weird.

You can cause that to happen by e.g
   echo 7813771264 > /sys/block/md5/md/sync_min

but you are unlikely to have done that deliberately.

>
> That said, as this resync is processing, I'd think/hope it would move
> the error forward, but it does not seem to:
> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> dd: reading `/dev/md5': Invalid argument
> 2+0 records in
> 2+0 records out
> 2147483648 bytes (2.1 GB) copied, 27.8491 s, 77.1 MB/s

EINVAL from a read() system call is surprising in this context.....

do_generic_file_read can return it:
	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
		return -EINVAL;

s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is

#define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

That is 2^(12+31) or 2^43 or 8TB.

Is this a 32bit system you are using?  Such systems can only support
buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
should get access to the whole device.

If this is a 64bit system, then the problem must be elsewhere.

NeilBrown
Attachment:
signature.asc

Description: PGP signature