Dealing with SATA resets and consequently slow ops

Christian Balzer <chibi@xxxxxxx> · Tue, 26 Mar 2019 16:42:44 +0900

Hello,

We've got some Intel DC S3610s 800GB in operation on cache tiers.
On the ones with G2010150 firmware we've seen _very_ infrequent SATA bus
resets [1]. On the order of once per year and these are fairly busy
critters with an average of 400 IOPS and peaks much higher than that.

Funnily enough the older 8 SSDs with the G2010140 firmware never have
shown this and given that all the newer once have at least once, that's
somewhat conclusive.

What I'm wondering is if there's a knob that allows an OSD to declare
itself down (not out) when any (and all) I/O takes more than x amount of
time.

On the affected OSD we see this, but from the perspective of the other
OSDs and MONs the health of this OSD was never in question of course:
---
2019-03-26 15:33:03.392644 7f09b0500700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f099747a700' had timed out after 15
---

Regards,

Christian

[1]
These kinds of reset, the logging happens after the fact, it takes about
40 seconds actually:
---
[54954736.886707] ata5.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6 frozen
[54954736.887424] ata5.00: failed command: WRITE FPDMA QUEUED
[54954736.887856] ata5.00: cmd 61/20:30:70:a2:da/00:00:25:00:00/40 tag 6 ncq dma 16384 out
                           res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[54954736.888695] ata5.00: status: { DRDY }
[54954736.889112] ata5.00: failed command: WRITE FPDMA QUEUED
[54954736.889527] ata5.00: cmd 61/08:38:b0:8b:71/00:00:26:00:00/40 tag 7 ncq dma 4096 out
                           res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[54954736.890358] ata5.00: status: { DRDY }
[54954736.890781] ata5: hard resetting link
[54954737.205313] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[54954737.206133] ata5.00: configured for UDMA/133
[54954737.206140] ata5: EH complete
---

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com