On Tue, Apr 04 2017, Michael Wang wrote: > On 04/04/2017 11:37 AM, NeilBrown wrote: >> On Tue, Apr 04 2017, Michael Wang wrote: > [snip] >>>> >>>> If sync_request_write() is using a bio that has already been used, it >>>> should call bio_reset() and fill in the details again. >>>> However I don't see how that would happen. >>>> Can you give specific details on the situation that triggers the bug? >>> >>> We have storage side mapping lv through scst to server, on server side >>> we assemble them into multipath device, and then assemble these dm into >>> two raid1. >>> >>> The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage >>> side we unmap all the lv (could during mkfs or fio), then on server side >>> we hit the BUG (reproducible). >> >> So I assume the initial resync is still happening at this point? >> And you unmap *all* the lv's so you expect IO to fail? >> I can see that the code would behave strangely if you have a >> bad-block-list configured (which is the default). >> Do you have a bbl? If you create the array without the bbl, does it >> still crash? > > The resync is at least happen concurrently in this case, we try > to simulate the case that all the connections dropped, the IO do > failed, also bunch of kernel log like: > > md: super_written gets error=-5 > blk_update_request: I/O error, dev dm-3, sector 64184 > md/raid1:md1: dm-2: unrecoverable I/O read error for block 46848 > > we expect that to happen, but server should not crash on BUG. > > And we haven't done any thing special regarding bbl, the bad_blocks > in sysfs are all empty. > >> >>> >>> The path of bio was confirmed by add tracing, it is reused in sync_request_write() >>> with 'bi_next' once chained inside blk_attempt_plug_merge(). >> >> I still don't see why it is re-used. >> I assume you didn't explicitly ask for a check/repair (i.e. didn't write >> to .../md/sync_action at all?). In that case MD_RECOVERY_REQUESTED is >> not set. > > Just unmap lv on storage side, no operation on server side. > >> So sync_request() sends only one bio to generic_make_request(): >> r1_bio->bios[r1_bio->read_disk]; >> >> then sync_request_write() *doesn't* send that bio again, but does send >> all the others. >> >> So where does it reuse a bio? > > If that's the design then it would be strange... the log do showing the path > of that bio go through sync_request(), will do more investigation. > >> >>> >>> We also tried to reset the bi_next inside sync_request_write() before >>> generic_make_request() which also works. >>> >>> The testing was done with 4.4, but we found upstream also left bi_next >>> chained after done in request, thus we post this RFC. >>> >>> Regarding raid1, we haven't found the place on path where the bio was >>> reset... where does it supposed to be? >> >> I'm not sure what you mean. >> We only reset bios when they are being reused. >> One place is in process_checks() where bio_reset() is called before >> filling in all the details. >> >> >> Maybe, in sync_request_write(), before >> >> wbio->bi_rw = WRITE; >> >> add something like >> if (wbio->bi_next) >> printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n", >> i, r1_bio->read_disk, wbio->bi_end_io); >> >> that might help narrow down what is happening. > > Just triggered again in 4.4, dmesg like: > > [ 399.240230] md: super_written gets error=-5 > [ 399.240286] md: super_written gets error=-5 > [ 399.240286] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240300] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240312] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240323] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240334] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240341] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240349] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 > [ 399.240352] bi_next!= NULL i=0 read_disk=0 bi_end_io=end_sync_write [raid1] It is very peculiar for r1_bio->bios[r1_bio->read_disk].bi_end_io to be end_sync_write. I can only see this happening when sync_request_write() assigns that, and this printk is before there. That seems to suggest that sync_request_write() is being performed on the same r1_bio twice, which is also very peculiar. I might have to try to reproduce this myself and see what is happening. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature