Re: 16.2.6 OSD down, out but container running....

Eugen Block <eblock@xxxxxx> · Thu, 28 Oct 2021 10:37:18 +0000

Hi,

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FIRMWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
[asc=5d, ascq=64]

this indicates a slowly failing drive. You should contact the vendor  
and replace the drive. This can happen on new drives, too.

Zitat von Marco Pizzolo <marcopizzolo@xxxxxxxxx>:

Thanks Hu Weiwen,

These hosts and drives are perhaps 2 months old or so, and this is the
first cluster we build on them so I was not anticipating a drive issue
already.

The smartmontools show:

root@<HOST>:~# smartctl -H /dev/sdag
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-38-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FIRMWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
[asc=5d, ascq=64]

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
root@<HOST>:~# smartctl -H /dev/sdah
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-38-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

On Wed, Oct 27, 2021 at 1:26 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:

Hi Marco, the log lines are truncated. I recommend you to send the logs to
a file rather than copying from terminal:

cephadm logs --name osd.13 > osd.13.log

I see “read stalled” in the log. Just a guess, can you check the kernel
logs and the SMART info to see if there is something wrong with this disk?
Maybe also do a self-test.

从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送

*发件人: *Marco Pizzolo <marcopizzolo@xxxxxxxxx>
*发送时间: *2021年10月28日 1:17
*收件人: *胡 玮文 <huww98@xxxxxxxxxxx>
*抄送: *ceph-users <ceph-users@xxxxxxx>
*主题: *Re:  16.2.6 OSD down, out but container running....

Is there any command or log I can provide a sample from that would help to
pinpoint the issue?  The 119 of 120 OSDs are working correctly by all
accounts, but I am just unable to have the bring the last one fully online.

Thank you,

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx