Re: Read errors on OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've seen similar issues in the past with 4U Supermicro servers populated with spinning disks. In my case it turned out to be a specific firmware+BIOS combination on the disk controller card that was buggy. I fixed it by updating the firmware and BIOS on the card to the latest versions.

I saw this on several servers, and it took a while to track down as you can imagine. Same symptoms you're reporting.

There was a data corruption problem a while back with the Linux kernel and Samsung 850 Pro drives, but your problem doesn't sound like data corruption. Still, I'd check to make sure the kernel version you're running has the fix.


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |


If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.


 

On Thu, 2017-06-01 at 13:40 +0100, Oliver Humpage wrote:
On 1 Jun 2017, at 11:55, Matthew Vernon <mv3@xxxxxxxxxxxx> wrote: You don't say what's in kern.log - we've had (rotating) disks that were throwing read errors but still saying they were OK on SMART.
Fair point. There was nothing correlating to the time that ceph logged an error this morning, which is why I didn’t mention it, but looking harder I see yesterday there was a May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Sense Key : Hardware Error [current] May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Add. Sense: Internal target failure May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 CDB: Read(10) 28 00 77 51 42 d8 00 02 00 00 May 31 07:20:13 osd1 kernel: blk_update_request: critical target error, dev sdi, sector 2001814232 sdi was the disk with the OSD affected today. Guess it’s flakey SSDs then. Weird that just re-reading the file makes everything OK though - wondering how much it’s worth worrying about that, or if there’s a way of making ceph retry reads automatically? Oliver. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux