Hi Song, Thank you for your response. On Fri, Nov 24, 2023 at 1:58 AM Song Liu <song@xxxxxxxxxx> wrote: > > Hi Alexander, > > Thanks for the report. > > On Wed, Nov 22, 2023 at 1:42 AM Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > > > > Hello Song Liu, > > > > We had a raid6 with 6 drives, all drives marked as In_sync. At some > > point drive in slot 5 (last drive) was marked as Faulty, due to > > timeout IO error. Superblocks of all other drives got updated with > > event count higher by 1. However, the Faulty drive was still not > > ejected from the array by remove_and_add_spares(), probably because it > > still had nr_pending. This situation was going on for 20 minutes, and > > the Faulty drive was still not being removed from the array. But array > > continued serving writes, skipping the Faulty drive as designed. > > > > After about 20 minutes, the machine got rebooted due to some other reason. > > > > After reboot, the array got assembled, and the event counter > > difference was 1 between the problematic drive and all other drives. > > Even count on all drives was 2834681, but on the problematic drive it > > was 2834680. As a result, mdadm considered the problematic drive as > > up-to-date, due to this code in mdadm[1]. Kernel also accepted such > > difference of 1, as can be seen in super_1_validate() [2]. > > Did super_1_validate() set the bad drive as Faulty? > > switch(role) { > case MD_DISK_ROLE_SPARE: /* spare */ > break; > case MD_DISK_ROLE_FAULTY: /* faulty */ > set_bit(Faulty, &rdev->flags); > break; > > AFAICT, this should stop the bad drive from going bad to the array. > If this doesn't work, there might be some bug somewhere. This can never happen. The reason is that a superblock of a particular device will never record about *itself* that it is Faulty. When a device fails, we mark it as Faulty in memory, and then md_update_sb() will skip updating the superblock of this particular device. I have discussed this also many years ago with Neil[1]. Can you please comment whether you believe that treating event counter difference 1 as "still up to date" is safe? Thanks, Alex. [1] Quoting from https://www.spinics.net/lists/raid/msg37003.html Question: When a drive fails, the kernel skips updating its superblock, and updates all other superblocks that this drive is Faulty. How can it happen that a drive can mark itself as Faulty in its own superblock? I saw code in mdadm checking for this. Answer: It cannot, as you say. I don't remember why mdadm checks for that. Maybe a very old version of the kernel code could do that.