Hello, We've got some Intel DC S3610s 800GB in operation on cache tiers. On the ones with G2010150 firmware we've seen _very_ infrequent SATA bus resets [1]. On the order of once per year and these are fairly busy critters with an average of 400 IOPS and peaks much higher than that. Funnily enough the older 8 SSDs with the G2010140 firmware never have shown this and given that all the newer once have at least once, that's somewhat conclusive. What I'm wondering is if there's a knob that allows an OSD to declare itself down (not out) when any (and all) I/O takes more than x amount of time. On the affected OSD we see this, but from the perspective of the other OSDs and MONs the health of this OSD was never in question of course: --- 2019-03-26 15:33:03.392644 7f09b0500700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f099747a700' had timed out after 15 --- Regards, Christian [1] These kinds of reset, the logging happens after the fact, it takes about 40 seconds actually: --- [54954736.886707] ata5.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6 frozen [54954736.887424] ata5.00: failed command: WRITE FPDMA QUEUED [54954736.887856] ata5.00: cmd 61/20:30:70:a2:da/00:00:25:00:00/40 tag 6 ncq dma 16384 out res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [54954736.888695] ata5.00: status: { DRDY } [54954736.889112] ata5.00: failed command: WRITE FPDMA QUEUED [54954736.889527] ata5.00: cmd 61/08:38:b0:8b:71/00:00:26:00:00/40 tag 7 ncq dma 4096 out res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [54954736.890358] ata5.00: status: { DRDY } [54954736.890781] ata5: hard resetting link [54954737.205313] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [54954737.206133] ata5.00: configured for UDMA/133 [54954737.206140] ata5: EH complete --- -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com