On Wed, 8 Jun 2022 10:48:09 +0700 Pavel <pavel2000@xxxxxx> wrote: > Hi, linux-raid community. > > Today we found a strange and even scaring behavior of md-raid RAID based > on NVMe devices. > > We ordered new server, and started data transfer (using dd, filesystems > was unmounted on source, etc - no errors here). > > While data in transfer, kernel started IO errors reporting on one of > NVMe devices. (dmesg output below) > But md-raid not reacted on them in any way. RAID array not went into any > failed state, and "clean" state reported all the time. > > Based on earlier practice, we trusted md-raid and thought things goes ok. > After data transfer finished, server was turned off and cables was > replaced on suspicion. > > After OS started on this new server, we found MySQL crashes. > Thorough checksum check showed us mismatches on files content. > (Of course, we did checksumming of untouched files, not MySQL database > files) > > So, there is data-loss possible when NVMe device behaves wrong. > We think, md-raid has to remove failed device from raid in a such case. > That it didn't happen is wrong behaviour, so want to inform community > about finding. > > Hope, this will help to make kernel ever better. > Thanks for your work. > Hi Pavel, IMO it is not a RAID problem. In this case some part of requests hangs inside nvme driver and raid1d hanged too. It is rather nvme problem not a raid. RAID should handle it well if IO errors are continuously reported. Thanks, Mariusz > --- > [Tue Jun 7 09:58:45 2022] Call Trace: > [Tue Jun 7 09:58:45 2022] <IRQ> > [Tue Jun 7 09:58:45 2022] nvme_pci_complete_rq+0x5b/0x67 [nvme] > [Tue Jun 7 09:58:45 2022] nvme_poll_cq+0x1e4/0x265 [nvme] > [Tue Jun 7 09:58:45 2022] nvme_irq+0x36/0x6e [nvme] > [Tue Jun 7 09:58:45 2022] __handle_irq_event_percpu+0x73/0x13e > [Tue Jun 7 09:58:45 2022] handle_irq_event_percpu+0x31/0x77 > [Tue Jun 7 09:58:45 2022] handle_irq_event+0x2e/0x51 > [Tue Jun 7 09:58:45 2022] handle_edge_irq+0xc9/0xee > [Tue Jun 7 09:58:45 2022] __common_interrupt+0x41/0x9e > [Tue Jun 7 09:58:45 2022] common_interrupt+0x6e/0x8b > [Tue Jun 7 09:58:45 2022] </IRQ> > [Tue Jun 7 09:58:45 2022] <TASK> > [Tue Jun 7 09:58:45 2022] asm_common_interrupt+0x1e/0x40 > [Tue Jun 7 09:58:45 2022] RIP: 0010:__blk_mq_try_issue_directly+0x12/0x136 > [Tue Jun 7 09:58:45 2022] Code: fe ff ff 48 8b 97 78 01 00 00 48 8b 92 > 80 00 00 00 48 89 34 c2 b0 01 c3 0f 1f 44 00 00 41 57 41 56 41 55 41 54 > 55 48 89 f5 53 <89> d3 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 > 24 10 31 c0 > [Tue Jun 7 09:58:45 2022] RSP: 0018:ffff88810bbf7ad8 EFLAGS: 00000246 > [Tue Jun 7 09:58:45 2022] RAX: 0000000000000000 RBX: 0000000000000001 > RCX: 0000000000000001 > [Tue Jun 7 09:58:45 2022] RDX: 0000000000000001 RSI: ffff8881137e6e40 > RDI: ffff88810dfdf400 > [Tue Jun 7 09:58:45 2022] RBP: ffff8881137e6e40 R08: 0000000000000001 > R09: 0000000000000a20 > [Tue Jun 7 09:58:45 2022] R10: 0000000000000000 R11: 0000000000000000 > R12: 0000000000000000 > [Tue Jun 7 09:58:45 2022] R13: ffff88810dfdf400 R14: ffff8881137e6e40 > R15: ffff88810bbf7df0 > [Tue Jun 7 09:58:45 2022] blk_mq_request_issue_directly+0x46/0x78 > [Tue Jun 7 09:58:45 2022] blk_mq_try_issue_list_directly+0x41/0xba > [Tue Jun 7 09:58:45 2022] blk_mq_sched_insert_requests+0x86/0xd0 > [Tue Jun 7 09:58:45 2022] blk_mq_flush_plug_list+0x1b5/0x214 > [Tue Jun 7 09:58:45 2022] ? __blk_mq_alloc_requests+0x1c7/0x21d > [Tue Jun 7 09:58:45 2022] blk_mq_submit_bio+0x437/0x518 > [Tue Jun 7 09:58:45 2022] submit_bio_noacct+0x93/0x1e6 > [Tue Jun 7 09:58:45 2022] ? bio_associate_blkg_from_css+0x137/0x15c > [Tue Jun 7 09:58:45 2022] flush_bio_list+0x96/0xa5 > [Tue Jun 7 09:58:45 2022] flush_pending_writes+0x7a/0xbf > [Tue Jun 7 09:58:45 2022] ? md_check_recovery+0x8a/0x4bd > [Tue Jun 7 09:58:45 2022] raid1d+0x194/0x10e8 > [Tue Jun 7 09:58:45 2022] ? common_interrupt+0xf/0x8b > [Tue Jun 7 09:58:45 2022] md_thread+0x12c/0x155 > [Tue Jun 7 09:58:45 2022] ? init_wait_entry+0x29/0x29 > [Tue Jun 7 09:58:45 2022] ? signal_pending+0x19/0x19 > [Tue Jun 7 09:58:45 2022] kthread+0x104/0x10c > [Tue Jun 7 09:58:45 2022] ? set_kthread_struct+0x32/0x32 > [Tue Jun 7 09:58:45 2022] ret_from_fork+0x22/0x30 > [Tue Jun 7 09:58:45 2022] </TASK> > [Tue Jun 7 09:58:45 2022] ---[ end trace 15dc74ae2e04f737 ]--- > [Tue Jun 7 09:58:45 2022] ------------[ cut here ]------------ > [Tue Jun 7 09:58:45 2022] refcount_t: underflow; use-after-free.