Re: DM Regression in 4.16-rc1 - read() returns data when it shouldn't

NeilBrown <neilb@xxxxxxxx> · Thu, 15 Feb 2018 11:07:11 +1100

On Wed, Feb 14 2018, Mike Snitzer wrote:

> On Wed, Feb 14 2018 at  3:39pm -0500,
> NeilBrown <neilb@xxxxxxxx> wrote:
>
>> On Wed, Feb 14 2018, Milan Broz wrote:
>> 
>> > Hi,
>> >
>> > the commit (found by bisect)
>> >
>> >   commit 18a25da84354c6bb655320de6072c00eda6eb602
>> >   Author: NeilBrown <neilb@xxxxxxxx>
>> >   Date:   Wed Sep 6 09:43:28 2017 +1000
>> >
>> >     dm: ensure bio submission follows a depth-first tree walk
>> >
>> > cause serious regression while reading from DM device.
>> >
>> > The reproducer is below, basically it tries to simulate failure we see in cryptsetup
>> > regression test: we have DM device with error and zero target and try to read
>> > "passphrase" from it (it is test for 64 bit offset error path):
>> >
>> > Test device:
>> > # dmsetup table test
>> > 0 10000000 error 
>> > 10000000 1000000 zero 
>> >
>> > We try to run this operation:
>> >   lseek64(fd, 5119999988, SEEK_CUR); // this should seek to error target sector
>> >   read(fd, buf, 13); // this should fail, if we seek to error part of the device
>> >
>> > While on 4.15 the read properly fails:
>> >   Seek returned 5119999988.
>> >   Read returned -1.
>> >
>> > for 4.16 it actually succeeds returning some random data
>> > (perhaps kernel memory, so this bug is even more dangerous):
>> >   Seek returned 5119999988.
>> >   Read returned 13.
>> >
>> > Full reproducer below:
>> >
>> > #define _GNU_SOURCE
>> > #define _LARGEFILE64_SOURCE
>> > #include <stdio.h>
>> > #include <stddef.h>
>> > #include <stdint.h>
>> > #include <stdlib.h>
>> > #include <unistd.h>
>> > #include <fcntl.h>
>> > #include <inttypes.h>
>> >
>> > int main (int argc, char *argv[])
>> > {
>> >         char buf[13];
>> >         int fd;
>> >         //uint64_t offset64 = 5119999999;
>> >         uint64_t offset64 =   5119999988;
>> >         off64_t r;
>> >         ssize_t bytes;
>> >
>> >         system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test");
>> >
>> >         fd = open("/dev/mapper/test", O_RDONLY);
>> >         if (fd == -1) {
>> >                 printf("open fail\n");
>> >                 return 1;
>> >         }
>> >
>> >         r = lseek64(fd, offset64, SEEK_CUR);
>> >         printf("Seek returned %" PRIu64 ".\n", r);
>> >         if (r < 0) {
>> >                 printf("seek fail\n");
>> >                 close(fd);
>> >                 return 2;
>> >         }
>> >
>> >         bytes = read(fd, buf, 13);
>> >         printf("Read returned %d.\n", (int)bytes);
>> >
>> >         close(fd);
>> >         return 0;
>> > }
>> >
>> >
>> > Please let me know if you need more info to reproduce it.
>> 
>> Thanks for the detailed report.  I haven't tried to reproduce, but the
>> code looks very weird.
>> The patch I posted "Date: Wed, 06 Sep 2017 09:43:28 +1000" had:
>>       +				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>>       +								 md->queue->bio_split);
>>       +				bio_advance(b, (bio_sectors(b) - ci.sector_count) << 9);
>>       +				bio_chain(b, bio);
>>       +				generic_make_request(b);
>>       +				break;
>> 
>> The code in Linux has:
>> 
>> 				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>> 								 md->queue->bio_split);
>> 				ci.io->orig_bio = b;
>> 				bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
>> 				bio_chain(b, bio);
>> 				ret = generic_make_request(bio);
>> 				break;
>> 
>> So the wrong bio is sent to generic_make_request().
>> Mike: do you remember how that change happened?  I think there were
>> discussions following the patch, but I cannot find anything about making
>> that change.
>
> Mikulas had this feedback:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00159.html
>
> You replied with:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00165.html
>
> Where you said "Yes, you are right something like that would be better."
>
> And you then provided a follow-up patch:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00175.html
>
> That we discussed and I said I'd just fold it into the original, and you
> agreed and thanked me for checking with you ;)
> https://www.redhat.com/archives/dm-devel/2017-November/msg00208.html
>
> Anyway, I'm just about to switch to Daddy Daycare mode (need to get my
> daughter up from her nap, feed her dinner, etc) so: I'll circle back to
> this tomorrow morning.
>

Thanks for the reminder - it's coming back to me now.

And looking again at the code, it doesn't seem so bad.  I was worried
about reassigning ci.io->orig_bio after calling
__split_and_process_non_flush(), but the important thing is to set it
before the last dec_pending() call, and the code gets that right.

I think I've found the problem though:

			/* done with normal IO or empty flush */
			bio->bi_status = io_error;

When we have chained bios, there are multiple sources for error which
need to be merged into bi_status:  all bios that were chained to this
one, and this bio itself.

__bio_chain_endio uses:

	if (!parent->bi_status)
		parent->bi_status = bio->bi_status;

so the new status won't over-ride the old status (unless there are
races....), but dm doesn't do that - I guess it didn't have to worry
about chains before.

So this:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d6de00f367ef..68136806d365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
 			queue_io(md, bio);
 		} else {
 			/* done with normal IO or empty flush */
-			bio->bi_status = io_error;
+			if (io_error)
+				bio->bi_status = io_error;
 			bio_endio(bio);
 		}
 	}

should fix it.  And __bio_chain_endio should probably be

diff --git a/block/bio.c b/block/bio.c
index e1708db48258..ad77140edc6f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
 {
 	struct bio *parent = bio->bi_private;
 
-	if (!parent->bi_status)
+	if (bio->bi_status)
 		parent->bi_status = bio->bi_status;
 	bio_put(bio);
 	return parent;

to remove the chance that a race will hide a real error.

Milan: could you please check that the patch fixes the dm-crypt bug?
I've confirmed that it fixes your test case.
When you confirm, I'll send a proper patch.

I guess I should audit all code that assigns to ->bi_status and make
sure nothing ever assigns 0 to it.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel