Re: 16.2.6 OSD down, out but container running....

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FIRMWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
[asc=5d, ascq=64]

this indicates a slowly failing drive. You should contact the vendor and replace the drive. This can happen on new drives, too.


Zitat von Marco Pizzolo <marcopizzolo@xxxxxxxxx>:

Thanks Hu Weiwen,

These hosts and drives are perhaps 2 months old or so, and this is the
first cluster we build on them so I was not anticipating a drive issue
already.

The smartmontools show:

root@<HOST>:~# smartctl -H /dev/sdag
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-38-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FIRMWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
[asc=5d, ascq=64]

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
root@<HOST>:~# smartctl -H /dev/sdah
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-38-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org



On Wed, Oct 27, 2021 at 1:26 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:

Hi Marco, the log lines are truncated. I recommend you to send the logs to
a file rather than copying from terminal:



cephadm logs --name osd.13 > osd.13.log



I see “read stalled” in the log. Just a guess, can you check the kernel
logs and the SMART info to see if there is something wrong with this disk?
Maybe also do a self-test.



从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送



*发件人: *Marco Pizzolo <marcopizzolo@xxxxxxxxx>
*发送时间: *2021年10月28日 1:17
*收件人: *胡 玮文 <huww98@xxxxxxxxxxx>
*抄送: *ceph-users <ceph-users@xxxxxxx>
*主题: *Re:  16.2.6 OSD down, out but container running....



Is there any command or log I can provide a sample from that would help to
pinpoint the issue?  The 119 of 120 OSDs are working correctly by all
accounts, but I am just unable to have the bring the last one fully online.



Thank you,

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux