On Thu, Mar 12, 2015 at 7:45 PM, Roger Heflin <rogerheflin@xxxxxxxxx> wrote: > Unless you have the drive under raid that means that 15 sectors cannot > be read and you have lost at least some data. > > The drives normally will not move sectors that it cannot successfully read. > > You may be able to copy the data off the disk, but you may when trying > this find a lot more bad sectors that the 15 that are currently > pending so may find your lost more data than 15 bad sectors would > indicate. Yes, could be true. For a drive with 15 sectors pending reallocation so far on a 3 month old drive, back it up, get it replaced under warranty. This fits the profile for a drive in early failure. Most drives don't do this, but of the drives in a batch that will exhibit early failure tend to do so right about at 3 months. Depending on what parts of the fs metadata are affected, 15 sectors could possibly make a good portion of the volume unrecoverable. > For the future my only suggestion is either to use raid and/or force > the reading of the disk at least weekly so that the disk will detect > "weak" sectors and either rewrite them or move them as needed. On > all of my critical disks I read and/or smartctl -t long all disks at > least weekly. From playing with a disk that is going bad it appears > that doing the -t long daily might keep ahead of sectors going bad, > but that means that the test runs the disk (though it can be accessed > for normal usage) for several hours a day each day. I'd say weekly is aggressive, but reasonable for an important array. The scrubs are probably more valuable because any explicit read errors get fixed up, whereas that's not necessarily the case for smartctl -t long. I have several drives in enclosures with crap bridge chip sets, so smartctl doesn't work, they've never had smart testing, none have had read or write errors. I'd say if it takes even weekly extended offline smart tests to avoid problems, the drive is bad. Understood this thread's use case isn't raid, but it bears repeating: By default, consumer drives like this one, will often attempt much longer recoveries, beyond the SCSI command timer value of 30 seconds; such recoveries get thwarted by the ensuing link reset, so the problem sector(s) aren't revealed, and don't get fixed. This is a problem for md, LVM, ZFS and Btrfs raid. So configuration has to be correct. This is a monthly event on linux-raid@ (or even more often sometimes several per week), and a high percent of the time all data on the raid is lost in the ensuing recovery. Backups! -- Chris Murphy -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org