I would run repeated extended offline tests. You should get several tests a day, and if it is a normal failing sector, a number of the tests should fail on the same few sector (or close to the same sector). Mine always seem to fail in 64sector groups. I have also had luck with a lot of extended tests seeming to catch errors and correct them before the sectors go bad. I have a troublesome disk that I am aggressively testing and it seems to result in the disk working better (less read failures to the kernel). On Wed, Oct 30, 2019 at 5:15 AM Andreas Klauer <Andreas.Klauer@xxxxxxxxxxxxxx> wrote: > > On Tue, Oct 29, 2019 at 07:53:46PM -0700, Marc MERLIN wrote: > > I can wipe the whole drive, but this puts me in degraded mode for a > > while without actually needing to be from what I can tell, so it's not > > my first choice. > > Use mdadm --replace to get it out of your RAID without degrading it. > Then you can safely use secure erase and other forms of scrubbing to > see if it changes anything. > > > But wouldn't that show real errors when I'm reading the whole drive? > > SMART Self-test log structure revision number 1 > > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > > # 1 Extended offline Completed without error 00% 21804 - > > # 5 Extended offline Completed: read failure 10% 21731 3457756336 > > #13 Extended offline Completed: read failure 10% 21562 2905616752 > > > 2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1 > > "Outdated" (few hours apart) is a very optimistic way of looking > at these test results. At least it shows the drive didn't just > invent those pending sectors. > > Regards > Andreas Klauer