On Thu, Feb 15 2018, Milan Broz wrote: > On 02/15/2018 01:07 AM, NeilBrown wrote: >> >> And looking again at the code, it doesn't seem so bad. I was worried >> about reassigning ci.io->orig_bio after calling >> __split_and_process_non_flush(), but the important thing is to set it >> before the last dec_pending() call, and the code gets that right. >> >> I think I've found the problem though: >> >> /* done with normal IO or empty flush */ >> bio->bi_status = io_error; >> >> When we have chained bios, there are multiple sources for error which >> need to be merged into bi_status: all bios that were chained to this >> one, and this bio itself. >> >> __bio_chain_endio uses: >> >> if (!parent->bi_status) >> parent->bi_status = bio->bi_status; >> >> so the new status won't over-ride the old status (unless there are >> races....), but dm doesn't do that - I guess it didn't have to worry >> about chains before. >> >> So this: >> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index d6de00f367ef..68136806d365 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error) >> queue_io(md, bio); >> } else { >> /* done with normal IO or empty flush */ >> - bio->bi_status = io_error; >> + if (io_error) >> + bio->bi_status = io_error; >> bio_endio(bio); >> } >> } >> > > Hi, > > well, it indeed fixes the reported problem, thanks. > But I just tested the first (dm.c) change above. > > The important part is also that if the error is hits bio later, it will produce > short read (and not fail of the whole operation), this can be easily tested if we switch > the error and the zero target in that reproducer > //system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test"); > system("echo -e \'0 10000000 zero\'\\\\n\'10000000 1000000 error\' | dmsetup create test"); > > So it seems your patch works. Excellent - thanks. > > I am not sure about this part though: > >> should fix it. And __bio_chain_endio should probably be >> >> diff --git a/block/bio.c b/block/bio.c >> index e1708db48258..ad77140edc6f 100644 >> --- a/block/bio.c >> +++ b/block/bio.c >> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) >> { >> struct bio *parent = bio->bi_private; >> >> - if (!parent->bi_status) >> + if (bio->bi_status) >> parent->bi_status = bio->bi_status; > > Shouldn't it be > if (!parent->bi_status && bio->bi_status) > parent->bi_status = bio->bi_status; > ? That wouldn't hurt, but I don't see how it would help. It would still be racy as two chained bios could complete at the same time, so ->bi_status much change between test and set. > > Maybe one rephrased question: what about priorities for errors (bio statuses)? > > For example, part of a bio fails with EILSEQ (data integrity check error, BLK_STS_PROTECTION), > part with EIO (media error, BLK_STS_IOERR). What should be reported for parent bio? > (I would expect I always get EIO here because this is hard, uncorrectable error.) If there was a well defined ordering, then we could easily write to code ensure the highest priority status wins, but I don't think there is any such ordering. Certainly dec_pending() in dm.c already faces the possibility that it will be called multiple times with different errors, but doesn't consider any ordering (except relating to BLK_STS_DM_REQUEUE), when assigning a status code to io->status. > > Anyway, thanks for the fix! And thanks for the report. NeilBrown > > Milan > > >> bio_put(bio); >> return parent; >> >> to remove the chance that a race will hide a real error. >> >> Milan: could you please check that the patch fixes the dm-crypt bug? >> I've confirmed that it fixes your test case. >> When you confirm, I'll send a proper patch. >> >> I guess I should audit all code that assigns to ->bi_status and make >> sure nothing ever assigns 0 to it. >> >> Thanks, >> NeilBrown >> > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel
Attachment:
signature.asc
Description: PGP signature
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel