On Wed, Jul 10, 2019 at 11:16 AM Andrey Zhunev <a-j@xxxxxx> wrote: > > Wednesday, July 10, 2019, 7:47:55 PM, you wrote: > > > On Wed, Jul 10, 2019 at 10:46 AM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > >> > >> # smartctl -l scterc,900,100 > >> # echo 180 > /sys/block/sda/device/timeout > > > > smartctl command above does need a drive specified... > > Indeed! :) > > With the commands above, you are increasing the timeout and then fsck > will try to re-read the sectors, right? More correctly, the drive firmware won't timeout, and will try longer to recover the data *if* the sectors are marginally bad. If the sectors are flat out bad, then the firmware will still (almost) immediately give up and at that point nothing else can be done except zero the bad sectors and hope fsck can reconstruct what's missing. Thing is, 68 sectors has a low likelihood of impacting fs metadata, because it's a smaller target than your actual data, or free space if there's a lot of it. > As for the SMART status, the number of pending sectors was 0 before. > It started to grow after the PSU incident yesterday. Now, since I'm > doing a ddrescue, all the sectors will be read (or attempted to be > read). So the pending sectors counter may increase further. It's a good and valid tactic to just use ddrescue with the previously mentioned modifications for SCT ERC and kernel timeouts, rather than directly use fsck on a drive that's clearly dying. > As I understand, when a drive cannot READ a sector, the sector is > reported as pending. And it will stay like that until either the > sector is finally read or until it is overwritten. When either of > these happens, the Pending Sector Counter should decrease. Sounds about right. > In theory, it can go back to 0 (although I didn't follow this closely > enough, so I never saw a drive like that). It can and should go to zero once all the pending sectors are overwritten with either good data or zeros. It's possible the write succeeds to the same sector, in which case it's no longer pending and not remapped. It's possible internally the write fails and the drive firmware does a remap to make the write succeed, in which case it's no longer pending. If a write fails (externally reported write failure to the kernel), then pending sectors will get pinned at that point and only ever go up as the drive continues to get worse. > If a drive can't WRITE to a sector, it tries to reallocate it. If it > succeeds, Reallocated Sectors Counter is increased. If it fails to > reallocate - I guess there should be another kind of error or a > counter, but I'm not sure which one. You get essentially the same UNC type of error you've seen except it's a write error instead of read. That's widely considered fatal because having a drive that can't write is just not usable for anything (well, read only). > > When reallocated sectors appear - it's clearly a bad sign. If the > number of reallocated sectors grow - the drive should not be used. > But it's not that obvious for the pending sectors... They're both bad news. It's just a matter of degree. Yes a manufacturer probably takes the position that pending sectors is and even remapping is normal drive behavior. But realistically it's not something anyone wants to have to deal with. It's useful for curiousity. Use it for Btrfs testing :-D -- Chris Murphy