On Fri, Feb 09, 2018 at 10:07:57PM +0000, Wol's lists wrote: > On 09/02/18 21:22, Marc MERLIN wrote: > >Interesting. I figured once a sector went pending once, it would not > >actually be re-used and > >be remapped on the next write. Seems like it didn't happen here. > > Because there's all sorts of reasons a sector can go pending. > > My favourite example is to compare it to DRAM. DRAM needs refreshing > every couple of seconds, otherwise it loses its contents and cannot be > read, but it's perfectly okay to rewrite and re-use it. You're correct. The density of drives is so high now that writing a block affects the ones around it. > Likewise, the magnetism in a drive can decay such that the data is > unreadable, but there's nothing actually wrong with the drive. (If the > data next door is repeatedly rewritten, the rewrite can "leak" and trash > nearby data ...) The decay time for that should be years. Right. That's why I'm unhappy that it happened within a week of unpacking the drives and 2 out of 5 had problems already. > The problem of course is when the problem has a decay time measured in > minutes or hours. The rewrite succeeds, so the sector doesn't get > remapped, but when you next read it it has died :-( Speaking of this, I still haven't gotten the drive to actually remap anything yet. On that 2nd drive, I'm seeing 7 pending sectors, and can't trigger any error or remapping on them: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 7 # 1 Short offline Completed: read failure 90% 519 569442000 # 2 Short offline Completed: read failure 90% 519 569442000 # 3 Extended offline Completed: read failure 90% 518 569442000 # 4 Short offline Completed without error 00% 508 - # 5 Short offline Completed without error 00% 484 - # 6 Short offline Completed without error 00% 460 - # 7 Short offline Completed without error 00% 436 - # 8 Short offline Completed: read failure 90% 413 569441985 # 9 Extended offline Completed: read failure 90% 409 569441990 #10 Extended offline Completed: read failure 90% 409 569441985 #11 Extended offline Completed: read failure 90% 409 569441991 #12 Extended offline Completed: read failure 90% 409 569441985 So, running badblocks over that range should help, right? But no, I get nothing: myth:~# badblocks -fsvn -b512 /dev/sdf 569942000 569001000 /dev/sdf is apparently in use by the system; badblocks forced anyway. Checking for bad blocks in non-destructive read-write mode >From block 569001000 to 569942000 Checking for bad blocks (non-destructive read-write test) Testing with random pattern: done Pass completed, 0 bad blocks found. (0/0/0 errors) In some way, unless I'm reading the wrong blocks, that would mean the blocks are good again? But smart still shows 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 7 and a short offline test immediately shows # 1 Short offline Completed: read failure 90% 519 569442000 Clearly, I still have some things to learn. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html