Re: Persistent failures with simple md setup

Hans-Peter Jansen <hpj@xxxxxxxxx> · Mon, 04 Feb 2013 21:43:29 +0100

Am Mittwoch, 30. Januar 2013, 18:12:46 schrieb Hans-Peter Jansen:
> 
> Hmm, according to mdadm from openSUSE:12.1:Update, the relevant fixes should
> be in place. It might be an unfortunate combination of this issue and the
> asynchronously applied updates, interfered by the *switching* behavior.
> 
> I started with regenerating the initrds now, and a first reboot succeeded so
> far. Good.
> 
> Will ask my friend to reboot the system a dozen times tonight.

After a few reboots, the issue reappeared. I really believe now, that by
driving the md in degraded mode for some time and with the switching behavior, 
just re-adding the devices resulted in unsynced raid1 devices.

Next, my friend managed to create a nearby data disaster: I've explained him,
how he would be able to re-add a device himself. He did so on sunday with his
home partition, and since there appeared no progress bar in /proc/mdstat, he 
immediately repeated the command. 

Neil, is it conceivable (due to a race or the like), that repeating to add 
(re-add) a device potentially creates data salad, since that home-fs (xfs) 
gone mad a few minutes later (firefox crashed, and couldn't be started, kmail 
crashed, and so on (all those processes, that write to ~). He decided to 
reboot, and that jailed him in the emergency recovery console, because /home 
couldn't be mounted anymore.

Both parts of the mirror were affected, the "old" part was ~200kb 
undestructive xfs_repair log, the other ~900kb, hence I decided to use the 
smaller one. First I failed and removed the other part, and then attempted to 
repair it. Unfortunately, the real repair run bailed out with:

disconnected inode 2161430687, moving to lost+found
corrupt dinode 2161430687, extent total = 1, nblocks = 0.  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@xxxxxxxxxxx.
cache_node_purge: refcount was 1, not zero (node=0xf867208)

fatal error -- 117 - couldn't iget disconnected inode

although I already used the (current) xfsprogs-3.1.6 version. :-(

After fixing that issue manually with xfs_db (== great fun), I was able to
recover the filesystem. It lost(+found) just a few new items, nobody cares 
about... So far, so good..

Now, the unsynced state disturbed me. Just re-adding the bad device might
result in an invalid mirror again. A "repair" run cannot be controlled. Hence
I zeroed the superblock of that partition, and added it. Et voila, it 
completely synced that mirror. Good.

Today, I hammered the raid1 partitions with "check". During one run, this 
appeared in syslog:

Feb  4 11:18:26 zaphkiel kernel: [11165.652478] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:26 zaphkiel kernel: [11165.652486] ata2.00: irq_stat 0x40000008
Feb  4 11:18:26 zaphkiel kernel: [11165.652495] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:26 zaphkiel kernel: [11165.652510] ata2.00: cmd 60/80:e0:12:ef:c2/00:00:0c:00:00/40 tag 28 ncq 65536 in
Feb  4 11:18:26 zaphkiel kernel: [11165.652513]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:26 zaphkiel kernel: [11165.652520] ata2.00: status: { DRDY ERR }
Feb  4 11:18:26 zaphkiel kernel: [11165.652524] ata2.00: error: { UNC }
Feb  4 11:18:26 zaphkiel kernel: [11165.652876] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
Feb  4 11:18:26 zaphkiel kernel: [11165.652882] ata2.00: revalidation failed (errno=-5)
Feb  4 11:18:26 zaphkiel kernel: [11165.652890] ata2: hard resetting link
Feb  4 11:18:26 zaphkiel kernel: [11165.957043] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  4 11:18:26 zaphkiel kernel: [11165.969910] ata2.00: configured for UDMA/133
Feb  4 11:18:26 zaphkiel kernel: [11165.970048] ata2: EH complete
Feb  4 11:18:28 zaphkiel kernel: [11167.949241] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:28 zaphkiel kernel: [11167.949249] ata2.00: irq_stat 0x40000008
Feb  4 11:18:28 zaphkiel kernel: [11167.949257] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:28 zaphkiel kernel: [11167.949272] ata2.00: cmd 60/80:10:12:ef:c2/00:00:0c:00:00/40 tag 2 ncq 65536 in
Feb  4 11:18:28 zaphkiel kernel: [11167.949275]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:28 zaphkiel kernel: [11167.949282] ata2.00: status: { DRDY ERR }
Feb  4 11:18:28 zaphkiel kernel: [11167.949287] ata2.00: error: { UNC }
Feb  4 11:18:28 zaphkiel kernel: [11167.962146] ata2.00: configured for UDMA/133
Feb  4 11:18:28 zaphkiel kernel: [11167.962206] ata2: EH complete
Feb  4 11:18:30 zaphkiel kernel: [11169.898187] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:30 zaphkiel kernel: [11169.898195] ata2.00: irq_stat 0x40000008
Feb  4 11:18:30 zaphkiel kernel: [11169.898204] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:30 zaphkiel kernel: [11169.898219] ata2.00: cmd 60/80:e0:12:ef:c2/00:00:0c:00:00/40 tag 28 ncq 65536 in
Feb  4 11:18:30 zaphkiel kernel: [11169.898222]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:30 zaphkiel kernel: [11169.898229] ata2.00: status: { DRDY ERR }
Feb  4 11:18:30 zaphkiel kernel: [11169.898234] ata2.00: error: { UNC }
Feb  4 11:18:30 zaphkiel kernel: [11169.912066] ata2.00: configured for UDMA/133
Feb  4 11:18:30 zaphkiel kernel: [11169.912117] ata2: EH complete
Feb  4 11:18:32 zaphkiel kernel: [11171.905192] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:32 zaphkiel kernel: [11171.905200] ata2.00: irq_stat 0x40000008
Feb  4 11:18:32 zaphkiel kernel: [11171.905208] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:32 zaphkiel kernel: [11171.905223] ata2.00: cmd 60/80:10:12:ef:c2/00:00:0c:00:00/40 tag 2 ncq 65536 in
Feb  4 11:18:32 zaphkiel kernel: [11171.905226]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:32 zaphkiel kernel: [11171.905233] ata2.00: status: { DRDY ERR }
Feb  4 11:18:32 zaphkiel kernel: [11171.905238] ata2.00: error: { UNC }
Feb  4 11:18:32 zaphkiel kernel: [11171.919099] ata2.00: configured for UDMA/133
Feb  4 11:18:32 zaphkiel kernel: [11171.919152] ata2: EH complete
Feb  4 11:18:34 zaphkiel kernel: [11173.912191] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:34 zaphkiel kernel: [11173.912199] ata2.00: irq_stat 0x40000008
Feb  4 11:18:34 zaphkiel kernel: [11173.912208] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:34 zaphkiel kernel: [11173.912223] ata2.00: cmd 60/80:e0:12:ef:c2/00:00:0c:00:00/40 tag 28 ncq 65536 in
Feb  4 11:18:34 zaphkiel kernel: [11173.912226]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:34 zaphkiel kernel: [11173.912233] ata2.00: status: { DRDY ERR }
Feb  4 11:18:34 zaphkiel kernel: [11173.912238] ata2.00: error: { UNC }
Feb  4 11:18:34 zaphkiel kernel: [11173.925101] ata2.00: configured for UDMA/133
Feb  4 11:18:34 zaphkiel kernel: [11173.925159] ata2: EH complete
Feb  4 11:18:36 zaphkiel kernel: [11175.861152] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Feb  4 11:18:36 zaphkiel kernel: [11175.861160] ata2.00: irq_stat 0x40000008
Feb  4 11:18:36 zaphkiel kernel: [11175.861168] ata2.00: failed command: READ FPDMA QUEUED
Feb  4 11:18:36 zaphkiel kernel: [11175.861183] ata2.00: cmd 60/80:10:12:ef:c2/00:00:0c:00:00/40 tag 2 ncq 65536 in
Feb  4 11:18:36 zaphkiel kernel: [11175.861186]          res 41/40:53:3f:ef:c2/00:00:0c:00:00/40 Emask 0x409 (media error) <F>
Feb  4 11:18:36 zaphkiel kernel: [11175.861193] ata2.00: status: { DRDY ERR }
Feb  4 11:18:36 zaphkiel kernel: [11175.861198] ata2.00: error: { UNC }
Feb  4 11:18:36 zaphkiel kernel: [11175.874052] ata2.00: configured for UDMA/133
Feb  4 11:18:36 zaphkiel kernel: [11175.874103] sd 1:0:0:0: [sdb] Unhandled sense code
Feb  4 11:18:36 zaphkiel kernel: [11175.874109] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb  4 11:18:36 zaphkiel kernel: [11175.874117] sd 1:0:0:0: [sdb]  Sense Key : Medium Error [current] [descriptor]
Feb  4 11:18:36 zaphkiel kernel: [11175.874125] Descriptor sense data with sense descriptors (in hex):
Feb  4 11:18:36 zaphkiel kernel: [11175.874130]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Feb  4 11:18:36 zaphkiel kernel: [11175.874145]         0c c2 ef 3f 
Feb  4 11:18:36 zaphkiel kernel: [11175.874153] sd 1:0:0:0: [sdb]  Add. Sense: Unrecovered read error - auto reallocate failed
Feb  4 11:18:36 zaphkiel kernel: [11175.874163] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 0c c2 ef 12 00 00 80 00
Feb  4 11:18:36 zaphkiel kernel: [11175.874180] end_request: I/O error, dev sdb, sector 214101823
Feb  4 11:18:36 zaphkiel kernel: [11175.874234] ata2: EH complete
Feb  4 11:18:38 zaphkiel kernel: [11177.954091] md: md124: data-check done.

This is a classical URE, isn't it? Interestingly, nonetheless, the raid1 check 
run succeeded! (Not so good, is it?)

Before you ask, both drives have a sane timeout already:

smartctl -l scterc /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.1.10-1.9-desktop] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

smartctl -l scterc /dev/sdb
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.1.10-1.9-desktop] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Attached is the result of smartctl -x of sda (good), and sdb (bad).

Could somebody from the audience have a look into it, and give me an 
assessment, how dangerous the state of this drive really is.

Last question: since I had to massage the system anyway, I've updated mdadm 
from 3.2.2 to 3.2.6. I red, that it can be dangerous to do so, what do I risk
here?

Thanks in advance,
Pete
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.1.10-1.9-desktop] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F1 RE
Device Model:     SAMSUNG HE103UJ
Serial Number:    S13VJDWS900483
LU WWN Device Id: 5 0024e9 002167ef6
Firmware Version: 1AA01118
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Mon Feb  4 14:04:00 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Disabled
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(12124) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 203) minutes.
Conveyance self-test routine
recommended polling time: 	 (  22) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   099   099   051    -    531
  3 Spin_Up_Time            POS---   073   073   011    -    9000
  4 Start_Stop_Count        -O--CK   099   099   000    -    1278
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   100   100   051    -    0
  8 Seek_Time_Performance   P-S--K   100   100   015    -    0
  9 Power_On_Hours          -O--CK   097   097   000    -    16416
 10 Spin_Retry_Count        PO--CK   100   100   051    -    0
 11 Calibration_Retry_Count -O--C-   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   099   099   000    -    1277
 13 Read_Soft_Error_Rate    -OSR--   099   099   000    -    528
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        PO--CK   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    1052
188 Command_Timeout         -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   076   059   000    -    24 (Min/Max 12/24)
194 Temperature_Celsius     -O---K   077   058   000    -    23 (Min/Max 12/26)
195 Hardware_ECC_Recovered  -O-RC-   100   100   000    -    19500704
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   253   253   000    -    0
200 Multi_Zone_Error_Rate   -O-R--   100   100   000    -    0
201 Soft_Read_Error_Rate    -O-R--   099   099   000    -    12
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
GP/S  Log at address 0x00 has    1 sectors [Log Directory]
SMART Log at address 0x01 has    1 sectors [Summary SMART error log]
SMART Log at address 0x02 has    2 sectors [Comprehensive SMART error log]
GP    Log at address 0x03 has    2 sectors [Ext. Comprehensive SMART error log]
GP    Log at address 0x04 has    2 sectors [Device Statistics log]
SMART Log at address 0x06 has    1 sectors [SMART self-test log]
GP    Log at address 0x07 has    2 sectors [Extended self-test log]
SMART Log at address 0x09 has    1 sectors [Selective self-test log]
GP    Log at address 0x10 has    1 sectors [NCQ Command Error log]
GP    Log at address 0x11 has    1 sectors [SATA Phy Event Counters]
GP    Log at address 0x20 has    2 sectors [Streaming performance log]
GP    Log at address 0x21 has    1 sectors [Write stream error log]
GP    Log at address 0x22 has    1 sectors [Read stream error log]
GP/S  Log at address 0x80 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x81 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x82 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x83 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x84 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x85 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x86 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x87 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x88 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x89 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x90 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x91 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x92 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x93 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x94 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x95 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x96 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x97 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x98 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x99 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0xe0 has    1 sectors [SCT Command/Status]
GP/S  Log at address 0xe1 has    1 sectors [SCT Data Transfer]

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 6
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 [5] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 f2 92 40 00     03:06:50.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 12 40 00     03:06:50.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 92 40 00     03:06:50.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f6 92 40 00     03:06:50.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f7 12 40 00     03:06:50.480  READ FPDMA QUEUED

Error 5 [4] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 f7 92 40 00     03:06:48.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f5 92 40 00     03:06:48.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 ef 12 40 00     03:06:48.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fe 12 40 00     03:06:48.480  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fb 12 40 00     03:06:48.480  READ FPDMA QUEUED

Error 4 [3] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 f2 92 40 00     03:06:46.470  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 12 40 00     03:06:46.470  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 92 40 00     03:06:46.470  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f6 92 40 00     03:06:46.470  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f7 12 40 00     03:06:46.470  READ FPDMA QUEUED

Error 3 [2] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 f7 92 40 00     03:06:44.520  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f5 92 40 00     03:06:44.520  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 ef 12 40 00     03:06:44.520  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fe 12 40 00     03:06:44.520  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fb 12 40 00     03:06:44.520  READ FPDMA QUEUED

Error 2 [1] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 f2 92 40 00     03:06:42.530  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 12 40 00     03:06:42.530  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f3 92 40 00     03:06:42.530  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f6 92 40 00     03:06:42.530  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f7 12 40 00     03:06:42.530  READ FPDMA QUEUED

Error 1 [0] occurred at disk power-on lifetime: 16413 hours (683 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 42 00 00 00 00 0c c2 ef 3e 40 00   at LBA = 0x0cc2ef3e = 214101822

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 80 00 00 00 c2 fb 92 40 00     03:06:40.150  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fb 12 40 00     03:06:40.150  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fa 92 40 00     03:06:40.150  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 fa 12 40 00     03:06:40.150  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 c2 f9 92 40 00     03:06:40.150  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (2 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                 23 Celsius
Power Cycle Max Temperature:         26 Celsius
Lifetime    Max Temperature:         63 Celsius
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:     -4/72 Celsius
Min/Max Temperature Limit:           -9/77 Celsius
Temperature History Size (Index):    128 (4)

Index    Estimated Time   Temperature Celsius
   5    2013-02-04 11:57    24  *****
 ...    ..( 32 skipped).    ..  *****
  38    2013-02-04 12:30    24  *****
  39    2013-02-04 12:31    25  ******
  40    2013-02-04 12:32    24  *****
 ...    ..( 52 skipped).    ..  *****
  93    2013-02-04 13:25    24  *****
  94    2013-02-04 13:26    23  ****
 ...    ..( 37 skipped).    ..  ****
   4    2013-02-04 14:04    23  ****

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

ATA_READ_LOG_EXT (addr=0x04:0x00, page=0, n=1) failed: scsi error aborted command
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            5  Transition from drive PhyRdy to drive PhyNRdy
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

smartctl 6.0 2012-10-10 r3643 [i686-linux-3.1.10-1.9-desktop] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F1 RE
Device Model:     SAMSUNG HE103UJ
Serial Number:    S13VJDWS900475
LU WWN Device Id: 5 0024e9 002167cfe
Firmware Version: 1AA01118
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Mon Feb  4 14:04:00 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Disabled
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(11933) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 200) minutes.
Conveyance self-test routine
recommended polling time: 	 (  21) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   100   100   051    -    0
  3 Spin_Up_Time            POS---   072   072   011    -    9340
  4 Start_Stop_Count        -O--CK   099   099   000    -    1278
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   100   100   051    -    0
  8 Seek_Time_Performance   P-S--K   100   100   015    -    0
  9 Power_On_Hours          -O--CK   097   097   000    -    16415
 10 Spin_Retry_Count        PO--CK   100   100   051    -    0
 11 Calibration_Retry_Count -O--C-   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   099   099   000    -    1277
 13 Read_Soft_Error_Rate    -OSR--   100   100   000    -    0
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        PO--CK   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   077   001   000    -    23 (Min/Max 11/23)
194 Temperature_Celsius     -O---K   078   058   000    -    22 (Min/Max 11/25)
195 Hardware_ECC_Recovered  -O-RC-   100   100   000    -    3573535
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   100   100   000    -    0
200 Multi_Zone_Error_Rate   -O-R--   100   100   000    -    0
201 Soft_Read_Error_Rate    -O-R--   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
GP/S  Log at address 0x00 has    1 sectors [Log Directory]
SMART Log at address 0x01 has    1 sectors [Summary SMART error log]
SMART Log at address 0x02 has    2 sectors [Comprehensive SMART error log]
GP    Log at address 0x03 has    2 sectors [Ext. Comprehensive SMART error log]
GP    Log at address 0x04 has    2 sectors [Device Statistics log]
SMART Log at address 0x06 has    1 sectors [SMART self-test log]
GP    Log at address 0x07 has    2 sectors [Extended self-test log]
SMART Log at address 0x09 has    1 sectors [Selective self-test log]
GP    Log at address 0x10 has    1 sectors [NCQ Command Error log]
GP    Log at address 0x11 has    1 sectors [SATA Phy Event Counters]
GP    Log at address 0x20 has    2 sectors [Streaming performance log]
GP    Log at address 0x21 has    1 sectors [Write stream error log]
GP    Log at address 0x22 has    1 sectors [Read stream error log]
GP/S  Log at address 0x80 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x81 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x82 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x83 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x84 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x85 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x86 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x87 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x88 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x89 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x8f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x90 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x91 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x92 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x93 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x94 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x95 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x96 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x97 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x98 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x99 has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9a has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9b has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9c has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9d has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9e has   16 sectors [Host vendor specific log]
GP/S  Log at address 0x9f has   16 sectors [Host vendor specific log]
GP/S  Log at address 0xe0 has    1 sectors [SCT Command/Status]
GP/S  Log at address 0xe1 has    1 sectors [SCT Data Transfer]

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (2 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                 22 Celsius
Power Cycle Max Temperature:         26 Celsius
Lifetime    Max Temperature:         55 Celsius
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:     -4/72 Celsius
Min/Max Temperature Limit:           -9/77 Celsius
Temperature History Size (Index):    128 (10)

Index    Estimated Time   Temperature Celsius
  11    2013-02-04 11:57    23  ****
 ...    ..( 31 skipped).    ..  ****
  43    2013-02-04 12:29    23  ****
  44    2013-02-04 12:30    24  *****
 ...    ..(  2 skipped).    ..  *****
  47    2013-02-04 12:33    24  *****
  48    2013-02-04 12:34    23  ****
 ...    ..( 79 skipped).    ..  ****
   0    2013-02-04 13:54    23  ****
   1    2013-02-04 13:55    22  ***
 ...    ..(  8 skipped).    ..  ***
  10    2013-02-04 14:04    22  ***

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

ATA_READ_LOG_EXT (addr=0x04:0x00, page=0, n=1) failed: scsi error aborted command
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            4  Transition from drive PhyRdy to drive PhyNRdy
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC