Re: Is SMART really that dumb?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 14, 2015 at 4:09 PM, Tom Horsley <horsley1953@xxxxxxxxx> wrote:
> On Sat, 14 Mar 2015 16:53:15 -0500
> Roger Heflin wrote:
>
>> Also usually the errors are found by linux doing a read against it, so
>> there should be error messages on the reads in the messages file when
>> it happened, that is usually what I use to determine what sectors are
>> getting the error.
>
> Yea, I poked around in the logs and the very first thing
> that looks like any kind of error is the smart message
> showing up for the first time (and repeating every
> 30 minutes since then in an attempt to fill up the logs :-).

I'd say the first step is to confirm this is due to a media error
rather than something else, otherwise you end up down a rat hole.

The top post here is a good example of a URE due to media error.
http://ubuntuforums.org/archive/index.php/t-1034762.html

If the drive is attempting a recovery longer than 30 seconds, you'll
get errors along these lines (this is a write example, which is really
bad, the read version is more common).

[ 2161.457698] ata8.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action
0x6 frozen
[ 2161.457709] ata8.00: failed command: WRITE FPDMA QUEUED
[ 2161.457718] ata8.00: cmd 61/00:00:80:c4:2c/02:00:1e:00:00/40 tag 0
ncq 262144 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2161.457723] ata8.00: status: { DRDY }
...
[ 5628.308982] ata8.00: failed command: WRITE FPDMA QUEUED
[ 5628.308990] ata8.00: cmd 61/80:50:80:34:44/01:00:50:00:00/40 tag 10
ncq 196608 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 5628.308993] ata8.00: status: { DRDY }
[ 5628.309000] ata8: hard resetting link
[ 5638.311674] ata8: softreset failed (1st FIS failed)
[ 5638.311686] ata8: hard resetting link


This is a how to on what to do about bad sectors, including partial recovery.
http://www.smartmontools.org/browser/trunk/www/badblockhowto.xml

But the tl;dr for all of that, in my opinion, is to update your
backups, and then obliterate the drive with writes. Only on a write
does the firmware determine if sector problems are transient or
persistent. If it's a persistent problem, then the LBA is reassigned
to a reserve sector. Once this is all done, then you can restore from
backups.

To do the write correctly, first you have to know if you have a 512n
or 512e drive. Most drives these days are 512e, or 512 byte logical,
4096 byte physical. The LBA error is for the first logical sector in
the bad physical sector. So writing over that 512 byte sector will not
work (it'll fail as a read error even though you're writing, due to a
read-modify-write attempt by the drive firmware). 'parted -l' will
tell you what type of drive you have is.

What I suggest is this:

# badblocks -b 4096 -svw /dev/sdX

This is destructive! Note that any block numbers that are reported by
badblocks at predicated on the -b value. So the reported value isn't a
sector LBA value. You have to multiply by 8 to get LBA. But after this
cycles through even once, the problem should be resolved. You could
let it run through all 8 passes (or whatever it is). What ought to be
true is you either get no errors (meaning all read errors weren't
media errors they were just bad data, like from torn writes or
something) or you get some write errors with reallocations on the
first pass. And no errors for subsequent passes. If any subsequent
passes have errors, especially corruption errors, then get rid of the
drive or turn it into a play thing or send it to me :-D


-- 
Chris Murphy
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org




[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux