Re: DM Regression in 4.16-rc1 - read() returns data when it shouldn't

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 15 2018, Milan Broz wrote:

> On 02/15/2018 01:07 AM, NeilBrown wrote:
>> 
>> And looking again at the code, it doesn't seem so bad.  I was worried
>> about reassigning ci.io->orig_bio after calling
>> __split_and_process_non_flush(), but the important thing is to set it
>> before the last dec_pending() call, and the code gets that right.
>> 
>> I think I've found the problem though:
>> 
>> 			/* done with normal IO or empty flush */
>> 			bio->bi_status = io_error;
>> 
>> When we have chained bios, there are multiple sources for error which
>> need to be merged into bi_status:  all bios that were chained to this
>> one, and this bio itself.
>> 
>> __bio_chain_endio uses:
>> 
>> 	if (!parent->bi_status)
>> 		parent->bi_status = bio->bi_status;
>> 
>> so the new status won't over-ride the old status (unless there are
>> races....), but dm doesn't do that - I guess it didn't have to worry
>> about chains before.
>> 
>> So this:
>> 
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index d6de00f367ef..68136806d365 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
>>  			queue_io(md, bio);
>>  		} else {
>>  			/* done with normal IO or empty flush */
>> -			bio->bi_status = io_error;
>> +			if (io_error)
>> +				bio->bi_status = io_error;
>>  			bio_endio(bio);
>>  		}
>>  	}
>> 
>
> Hi,
>
> well, it indeed fixes the reported problem, thanks.
> But I just tested the first (dm.c) change above.
>
> The important part is also that if the error is hits bio later, it will produce
> short read (and not fail of the whole operation), this can be easily tested if we switch
> the error and the zero target in that reproducer
>   //system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test");
>   system("echo -e \'0 10000000 zero\'\\\\n\'10000000 1000000 error\' | dmsetup create test");
>
> So it seems your patch works.

Excellent - thanks.

>
> I am not sure about this part though:
>
>> should fix it.  And __bio_chain_endio should probably be
>> 
>> diff --git a/block/bio.c b/block/bio.c
>> index e1708db48258..ad77140edc6f 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>>  {
>>  	struct bio *parent = bio->bi_private;
>>  
>> -	if (!parent->bi_status)
>> +	if (bio->bi_status)
>>  		parent->bi_status = bio->bi_status;
>
> Shouldn't it be
> 	if (!parent->bi_status && bio->bi_status)
>   		parent->bi_status = bio->bi_status;
> ?

That wouldn't hurt, but I don't see how it would help.
It would still be racy as two chained bios could complete at
the same time, so ->bi_status much change between test and set.

>
> Maybe one rephrased question: what about priorities for errors (bio statuses)?
>
> For example, part of a bio fails with EILSEQ (data integrity check error, BLK_STS_PROTECTION),
> part with EIO (media error, BLK_STS_IOERR). What should be reported for parent bio?
> (I would expect I always get EIO here because this is hard, uncorrectable error.)

If there was a well defined ordering, then we could easily write to code
ensure the highest priority status wins, but I don't think there is any
such ordering.
Certainly dec_pending() in dm.c already faces the possibility that it will
be called multiple times with different errors, but doesn't consider any
ordering (except relating to BLK_STS_DM_REQUEUE), when assigning a
status code to io->status.

>
> Anyway, thanks for the fix!

And thanks for the report.

NeilBrown

>
> Milan
>
>
>>  	bio_put(bio);
>>  	return parent;
>> 
>> to remove the chance that a race will hide a real error.
>> 
>> Milan: could you please check that the patch fixes the dm-crypt bug?
>> I've confirmed that it fixes your test case.
>> When you confirm, I'll send a proper patch.
>> 
>> I guess I should audit all code that assigns to ->bi_status and make
>> sure nothing ever assigns 0 to it.
>> 
>> Thanks,
>> NeilBrown
>> 
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

Attachment: signature.asc
Description: PGP signature

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux