On 04/09/15 20:55, Richard Bade wrote: > We have a Ceph pool that is entirely made up of Intel S3700/S3710 > enterprise SSD's. > > We are seeing some significant I/O delays on the disks causing a “SCSI > Task Abort” from the OS. This seems to be triggered by the drive > receiving a “Synchronize cache command”. I've heard from other sources that the new Intel 3610 and 3710 have been afflicted by a bug, possibly now fixed with new firmware, that might be the cause of your problem. The person who first reported it said that they upgraded from 3600 units and never had a problem but started seeing issues with 3610 model. When they look at their log they see this Aug 9 21:50:39 cetacea kernel: [177609.957939] ata2.00: exception Emask 0x0 SAct 0x6000 SErr 0x0 action 0x6 frozen Aug 9 21:50:39 cetacea kernel: [177609.958480] ata2.00: failed command: READ FPDMA QUEUED Aug 9 21:50:39 cetacea kernel: [177609.958995] ata2.00: cmd 60/00:68:00:08:db/04:00:0a:00:00/40 tag 13 ncq 524288 in Aug 9 21:50:39 cetacea kernel: [177609.958995] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 9 21:50:39 cetacea kernel: [177609.960074] ata2.00: status: { DRDY } Aug 9 21:50:39 cetacea kernel: [177609.960628] ata2.00: failed command: READ FPDMA QUEUED Aug 9 21:50:39 cetacea kernel: [177609.961198] ata2.00: cmd 60/f0:70:00:0c:db/00:00:0a:00:00/40 tag 14 ncq 122880 in Aug 9 21:50:39 cetacea kernel: [177609.961198] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 9 21:50:39 cetacea kernel: [177609.962405] ata2.00: status: { DRDY } Aug 9 21:50:39 cetacea kernel: [177609.963001] ata2: hard resetting link Aug 9 21:50:40 cetacea kernel: [177610.281881] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 9 21:50:40 cetacea kernel: [177610.282865] ata2.00: configured for UDMA/133 Aug 9 21:50:40 cetacea kernel: [177610.282887] ata2.00: device reported invalid CHS sector 0 Aug 9 21:50:40 cetacea kernel: [177610.282890] ata2.00: device reported invalid CHS sector 0 Aug 9 21:50:40 cetacea kernel: [177610.282896] ata2: EH complete If you're running your SSDs behind a RAID controller you might not be able to get these diagnostics, it may simply seem that the drive has glitched. they later reported: > So as far as I can determine, 100% of s3610 and s3710 SSDs were > shipped broken and now Intel will fix it in firmware without > acknowledging that there ever was a problem. > https://communities.intel.com/thread/77801?start=0&tstart=0 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com