Disk is not ok, look to the output below: SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE you should replace the disk. On Wed, May 20, 2020 at 5:11 PM Thomas <74cmonty@xxxxxxxxx> wrote: > > Hello, > > I have a pool of +300 OSDs that are identical model (Seagate model: > ST1800MM0129 size: 1.64 TiB). > Only 1 OSD crashes regularely, however I cannot identify a root cause. > > Based on the output of smartctl the disk is ok. > > # smartctl -a -d megaraid,1 > /dev/sda > [47/1833] > smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.3.18-2-pve] (local build) > Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Vendor: LENOVO-X > Product: ST1800MM0129 > Revision: L2B6 > Compliance: SPC-4 > User Capacity: 1,800,360,124,416 bytes [1.80 TB] > Logical block size: 512 bytes > Physical block size: 4096 bytes > LU is fully provisioned > Rotation Rate: 10500 rpm > Form Factor: 2.5 inches > Logical Unit id: 0x5000c500bb7822cf > Serial number: WBN0QHX80000E852944J > Device type: disk > Transport protocol: SAS (SPL-3) > Local Time is: Mon May 18 09:19:41 2020 CEST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > Temperature Warning: Enabled > > === START OF READ SMART DATA SECTION === > SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE > FAILURE [asc=5d, ascq=10] [22/1833] > > Grown defects during certification <not available> > Total blocks reassigned during format <not available> > Total new blocks reassigned = 68 > Power on minutes since format <not available> > Current Drive Temperature: 33 C > Drive Trip Temperature: 65 C > > Manufactured in week 31 of year 2018 > Specified cycle count over device lifetime: 10000 > Accumulated start-stop cycles: 21 > Specified load-unload count over device lifetime: 300000 > Accumulated load-unload cycles: 709 > Elements in grown defect list: 18 > > Error counter log: > Errors Corrected by Total Correction > Gigabytes Total > ECC rereads/ errors algorithm > processed uncorrected > fast | delayed rewrites corrected invocations [10^9 > bytes] errors > read: 3278853896 1 0 3278853897 32 > 83933.567 19 > write: 0 0 0 0 0 > 24093.894 0 > verify: 3080361880 0 0 3080361880 0 > 12630.494 0 > > Non-medium error count: 244 > > SMART Self-test log > Num Test Status segment LifeTime > LBA_first_err [SK ASC ASQ] > Description number (hours) > # 1 Background short Completed - > 3761 - [- - -] > # 2 Background short Completed - > 3737 - [- - -] > # 3 Background short Completed - > 3713 - [- - -] > # 4 Background short Completed - > 3689 - [- - -] > # 5 Background short Completed - > 3665 - [- - -] > # 6 Background short Completed - > 3641 - [- - -] > # 7 Background short Completed - > 3617 - [- - -] > # 8 Background short Completed - > 3593 - [- - -] > # 9 Background long Completed - > 3569 - [- - -] > #10 Background short Completed - > 3545 - [- - -] > #11 Background short Completed - > 3521 - [- - -] > #12 Background short Completed - > 3497 - [- - -] > #13 Background short Completed - > 3473 - [- - -] > #14 Background short Completed - > 3449 - [- - -] > #15 Background short Completed - > 3425 - [- - -] > #16 Background short Completed - > 3401 - [- - -] > #17 Background short Completed - > 3377 - [- - -] > #18 Background short Completed - > 3353 - [- - -] > #19 Background short Completed - > 3329 - [- - -] > #20 Background short Completed - > 3305 - [- - -] > > Long (extended) Self-test duration: 9459 seconds [157.7 minutes] > > I have attached the log of the affected OSD. > > THX > Thomas > > Ich habe 1 zu dieser E-Mail gehörende Datei hochgeladen: > ceph-osd.92.log.1.gz <https://we.tl/t-7DzNCDP3iZ>(578 > KB)WeTransferhttps://we.tl/t-7DzNCDP3iZ > Mozilla Thunderbird <https://www.thunderbird.net> macht es einfach, > große Dateien über E-Mails zu teilen. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx