Re: Power outages!!! help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I ran short smart test and definitely fails on read.  

# smartctl -t short /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Mon Aug 28 15:41:20 2017

Use smartctl -X to abort test.

# smartctl -l selftest /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     28400         2946871664

I see below site, but lacks information on how to handle in XFS case.  There is one site that talks about it and provides link to nabble... but gives me 404..  Any recommendations on how to go about this?


Thanks!

Regards,
Hong


On Monday, August 28, 2017 3:45 PM, hjcho616 <hjcho616@xxxxxxxxx> wrote:


So.. would doing something like this could potentially bring it back to life? =)





On Monday, August 28, 2017 3:24 PM, Tomasz Kusmierz <tom.kusmierz@xxxxxxxxx> wrote:


I think you’ve got your anwser:

197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1

On 28 Aug 2017, at 21:22, hjcho616 <hjcho616@xxxxxxxxx> wrote:

Steve,

I thought that was odd too.. 

Below is from the log, This captures transition from good to bad. Looks like there is "Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors".  And looks like I did a repair with /dev/sdb1... =P

# grep sdb syslog.1
Aug 27 06:27:22 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 44 to 43
Aug 27 06:57:22 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 43 to 45
Aug 27 07:27:21 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 44
Aug 27 07:57:21 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 44 to 45
Aug 27 10:57:22 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 44
Aug 27 13:27:21 OSD1 smartd[1031]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 44 to 45
Aug 27 13:53:34 OSD1 kernel: [    1.454082] sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Aug 27 13:53:34 OSD1 kernel: [    1.454447] sd 1:0:0:0: [sdb] Write Protect is off
Aug 27 13:53:34 OSD1 kernel: [    1.454448] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Aug 27 13:53:34 OSD1 kernel: [    1.454488] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 27 13:53:34 OSD1 kernel: [    1.501349]  sdb: sdb1
Aug 27 13:53:34 OSD1 kernel: [    1.501796] sd 1:0:0:0: [sdb] Attached SCSI disk
Aug 27 13:53:34 OSD1 kernel: [    4.033081] XFS (sdb1): Mounting V4 Filesystem
Aug 27 13:53:34 OSD1 kernel: [    4.207191] XFS (sdb1): Starting recovery (logdev: internal)
Aug 27 13:53:34 OSD1 kernel: [    5.656298] XFS (sdb1): Ending recovery (logdev: internal)
Aug 27 13:53:34 OSD1 smartd[1028]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Aug 27 13:53:34 OSD1 smartd[1028]: Device: /dev/sdb [SAT], opened
Aug 27 13:53:34 OSD1 smartd[1028]: Device: /dev/sdb [SAT], SAMSUNG HD204UI, S/N:S2H7JD1B306112, WWN:5-0024e9-004c7c449, FW:1AQ10001, 2.00 TB
Aug 27 13:53:34 OSD1 smartd[1028]: Device: /dev/sdb [SAT], found in smartd database: SAMSUNG SpinPoint F4 EG (AF)
Aug 27 13:53:34 OSD1 smartd[1028]: Device: /dev/sdb [SAT], WARNING: Using smartmontools or hdparm with this
Aug 27 13:53:36 OSD1 smartd[1028]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 13:53:36 OSD1 smartd[1028]: Device: /dev/sdb [SAT], state read from /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 13:53:45 OSD1 smartd[1028]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 44
Aug 27 13:53:49 OSD1 smartd[1028]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:52:36 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:53:06 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:53:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 15:53:50 OSD1 smartd[1028]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 44 to 43
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:53:57 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:54:10 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/05efi on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10freedos on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 10freedos: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/10qnx on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 10qnx: debug: /dev/sdb1 is not a QNX4 partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20macosx on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 macosx-prober: debug: /dev/sdb1 is not an HFS+ partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/20microsoft on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 20microsoft: debug: /dev/sdb1 is not a MS partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/30utility on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 30utility: debug: /dev/sdb1 is not a FAT partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/40lsb on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/70hurd on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/80minix on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/83haiku on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 83haiku: debug: /dev/sdb1 is not a BeFS partition: exiting
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90linux-distro on mounted /dev/sdb1
Aug 27 15:54:14 OSD1 os-prober: debug: running /usr/lib/os-probes/mounted/90solaris on mounted /dev/sdb1
Aug 27 16:11:04 OSD1 kernel: [    1.459684] sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Aug 27 16:11:04 OSD1 kernel: [    1.459740] sd 1:0:0:0: [sdb] Write Protect is off
Aug 27 16:11:04 OSD1 kernel: [    1.459742] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Aug 27 16:11:04 OSD1 kernel: [    1.459777] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 27 16:11:04 OSD1 kernel: [    1.505022]  sdb: sdb1
Aug 27 16:11:04 OSD1 kernel: [    1.505554] sd 1:0:0:0: [sdb] Attached SCSI disk
Aug 27 16:11:04 OSD1 kernel: [    4.036822] XFS (sdb1): Mounting V4 Filesystem
Aug 27 16:11:04 OSD1 kernel: [    4.194733] XFS (sdb1): Ending clean mount
Aug 27 16:11:04 OSD1 smartd[864]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Aug 27 16:11:04 OSD1 smartd[864]: Device: /dev/sdb [SAT], opened
Aug 27 16:11:04 OSD1 smartd[864]: Device: /dev/sdb [SAT], SAMSUNG HD204UI, S/N:S2H7JD1B306112, WWN:5-0024e9-004c7c449, FW:1AQ10001, 2.00 TB
Aug 27 16:11:04 OSD1 smartd[864]: Device: /dev/sdb [SAT], found in smartd database: SAMSUNG SpinPoint F4 EG (AF)
Aug 27 16:11:04 OSD1 smartd[864]: Device: /dev/sdb [SAT], WARNING: Using smartmontools or hdparm with this
Aug 27 16:11:06 OSD1 smartd[864]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 16:11:06 OSD1 smartd[864]: Device: /dev/sdb [SAT], state read from /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 16:11:13 OSD1 smartd[864]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 43 to 44
Aug 27 16:11:16 OSD1 smartd[864]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 17:12:20 OSD1 kernel: [    1.454227] sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Aug 27 17:12:20 OSD1 kernel: [    1.454309] sd 1:0:0:0: [sdb] Write Protect is off
Aug 27 17:12:20 OSD1 kernel: [    1.454310] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Aug 27 17:12:20 OSD1 kernel: [    1.454346] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 27 17:12:20 OSD1 kernel: [    1.467745] sd 1:0:0:0: [sdb] Attached SCSI disk
Aug 27 17:12:20 OSD1 smartd[852]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Aug 27 17:12:20 OSD1 smartd[852]: Device: /dev/sdb [SAT], opened
Aug 27 17:12:20 OSD1 smartd[852]: Device: /dev/sdb [SAT], SAMSUNG HD204UI, S/N:S2H7JD1B306112, WWN:5-0024e9-004c7c449, FW:1AQ10001, 2.00 TB
Aug 27 17:12:20 OSD1 smartd[852]: Device: /dev/sdb [SAT], found in smartd database: SAMSUNG SpinPoint F4 EG (AF)
Aug 27 17:12:20 OSD1 smartd[852]: Device: /dev/sdb [SAT], WARNING: Using smartmontools or hdparm with this
Aug 27 17:12:22 OSD1 smartd[852]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 17:12:22 OSD1 smartd[852]: Device: /dev/sdb [SAT], state read from /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 17:12:28 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 17:12:28 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 3 Spin_Up_Time changed from 78 to 67
Aug 27 17:12:28 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 44 to 48
Aug 27 17:12:28 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 197 Current_Pending_Sector changed from 252 to 100
Aug 27 17:12:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.SAMSUNG_HD204UI-S2H7JD1B306112.ata.state
Aug 27 17:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 17:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 48 to 47
Aug 27 18:00:10 OSD1 kernel: [ 2876.366644] XFS (sdb): Mounting V4 Filesystem
Aug 27 18:00:11 OSD1 kernel: [ 2876.492421] XFS (sdb): Ending clean mount
Aug 27 18:00:31 OSD1 kernel: [ 2897.154593] XFS (sdb): Unmounting Filesystem
Aug 27 18:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 18:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 47 to 48
Aug 27 18:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 18:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 48 to 47
Aug 27 19:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 19:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 20:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 20:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 21:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 21:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 22:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 22:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 23:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 23:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 00:12:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 00:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 01:12:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 01:12:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 47 to 48
Aug 28 01:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 01:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 48 to 47
Aug 28 02:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 02:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 47 to 46
Aug 28 02:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 02:42:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 46 to 47
Aug 28 03:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 03:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 04:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 04:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 05:12:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 05:42:33 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 28 06:12:32 OSD1 smartd[852]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors

smart output..

# smartctl -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F4 EG (AF)
Device Model:     SAMSUNG HD204UI
Serial Number:    S2H7JD1B306112
LU WWN Device Id: 5 0024e9 004c7c449
Firmware Version: 1AQ10001
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Aug 28 15:12:23 2017 CDT

==> WARNING: Using smartmontools or hdparm with this
drive may result in data loss due to a firmware bug.
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
Buggy and fixed firmware report same version number!
See the following web pages for details:

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (19920) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 332) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       37
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   067   066   025    Pre-fail  Always       -       10197
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       146
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       28399
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       217
181 Program_Fail_Cnt_Total  0x0022   099   099   000    Old_age   Always       -       35325174
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       2855
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   047   041   000    Old_age   Always       -       53 (Min/Max 15/59)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       4222
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       217

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I haven't tried to repair xfs's before... nor tried to do anything with smart... usually when I saw something on smart, I order a replacement HDD and change them soon.. =)  Do you know of something I can do in this case with least amount of damage and hoply recover them? =)

Regards,
Hong


On Monday, August 28, 2017 2:47 PM, Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:


I'm jumping in a little late here, but running xfs_repair on your partition can't frag your partition table. The partition table lives outside the partition block device and xfs_repair doesn't have access to it when run against /dev/sdb1. I haven't actually tested it, but it seems unlikely that running xfs_repair on /dev/sdb would do it either. I would assume it would just give you an error about /dev/sdb not containing an XFS filesystem. That's a guess though. I haven't ever tried anything like that.

Are you sure there isn't physical damage to the disk? I wouldn't say it's common, but power outages can do that. You can run 'dmesg | grep sdb' and 'smartctl -a /dev/sdb' to see if there are kernel errors or SMART errors indicative of physical problems. If the disk is physically sound and the partition table really has been fragged, you may be able to restore it from the backup at the end of the disk, assuming it's GPT. If you can't find a partition or a filesystem somehow, then you're probably out of luck as far as retrieving any objects from that OSD. If the disk is physically damaged and your partition is gone, then it probably isn't worth wasting additional time on it.


<SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg>
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.


On Mon, 2017-08-28 at 19:18 +0000, hjcho616 wrote:
Tomasz,

Looks like when I did xfs_repair -L /dev/sdb1 it did something to partition table and I don't see /dev/sdb1 anymore... or maybe I missed 1 in the /dev/sdb1? =(. Yes.. that extra power outage did a pretty good damage... =P  I am hoping 0.007% is very small...=P  Any recommendations on fixing xfs partition I am missing? =)

Ronny,

Thank you for that link!

No I haven't done anything to osds... not touching them, hoping that I can revive some of them.. =)  Only thing done is trying to start and stop them..

Below are the links to newer files with just one start attempt. =)









Regards,
Hong


On Monday, August 28, 2017 12:53 PM, Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:


comments inline

On 28.08.2017 18:31, hjcho616 wrote:


I'll see what I can do on that... Looks like I may have to add another OSD host as I utilized all of the SATA ports on those boards. =P

Ronny,

I am running with size=2 min_size=1.  I created everything with ceph-deploy and didn't touch much of that pool settings...  I hope not, but sounds like I may have lost some files!  I do want some of those OSDs to come back online somehow... to get that confidence level up. =P


This is a bad idea as you have found out. once your cluster is healthy you should look at improving this.

The dead osd.3 message is probably me trying to stop and start the osd.  There were some cases where stop didn't kill the ceph-osd process.  I just started or restarted osd to try and see if that worked..  After that, there were some reboots and I am not seeing those messages after it...


when providing logs. try to move away the old one. do a single startup. and post that. it makes it easier to read when you have a single run in the file.


This is something I am running at home.  I am the only user.  In a way it is production environment but just driven by me. =)

Do you have any suggestions to get any of those osd.3, osd.4, osd.5, and osd.8 come back up without removing them?  I have a feeling I can get some data back with some of them intact.

just incase you are not able to make them run again, does not automatically mean the data is lost. i have successfully recovered lost object using these instructions  http://ceph.com/geen-categorie/incomplete-pgs-oh-my/ 

I would start by  renaming the osd's log file, do a single try at starting the osd. and posting that log. have you done anything to the osd's that could make them not run ?


kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


<SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg>





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux