On Mon, Sep 7, 2015 at 12:54 PM, Paul Mansfield <paul.mansfield@xxxxxxxxxxxxxxxxxx> wrote: > > > On 04/09/15 20:55, Richard Bade wrote: >> We have a Ceph pool that is entirely made up of Intel S3700/S3710 >> enterprise SSD's. >> >> We are seeing some significant I/O delays on the disks causing a “SCSI >> Task Abort” from the OS. This seems to be triggered by the drive >> receiving a “Synchronize cache command”. > > I've heard from other sources that the new Intel 3610 and 3710 have been > afflicted by a bug, possibly now fixed with new firmware, that might be > the cause of your problem. > The person who first reported it said that they upgraded from 3600 units > and never had a problem but started seeing issues with 3610 model. > > When they look at their log they see this > > Aug 9 21:50:39 cetacea kernel: [177609.957939] ata2.00: exception Emask > 0x0 SAct 0x6000 SErr 0x0 action 0x6 frozen > Aug 9 21:50:39 cetacea kernel: [177609.958480] ata2.00: failed command: > READ FPDMA QUEUED > Aug 9 21:50:39 cetacea kernel: [177609.958995] ata2.00: cmd > 60/00:68:00:08:db/04:00:0a:00:00/40 tag 13 ncq 524288 in > Aug 9 21:50:39 cetacea kernel: [177609.958995] res > 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > Aug 9 21:50:39 cetacea kernel: [177609.960074] ata2.00: status: { DRDY } > Aug 9 21:50:39 cetacea kernel: [177609.960628] ata2.00: failed command: > READ FPDMA QUEUED > Aug 9 21:50:39 cetacea kernel: [177609.961198] ata2.00: cmd > 60/f0:70:00:0c:db/00:00:0a:00:00/40 tag 14 ncq 122880 in > Aug 9 21:50:39 cetacea kernel: [177609.961198] res > 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Aug 9 21:50:39 cetacea kernel: [177609.962405] ata2.00: status: { DRDY } > Aug 9 21:50:39 cetacea kernel: [177609.963001] ata2: hard resetting link > Aug 9 21:50:40 cetacea kernel: [177610.281881] ata2: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Aug 9 21:50:40 cetacea kernel: [177610.282865] ata2.00: configured for > UDMA/133 > Aug 9 21:50:40 cetacea kernel: [177610.282887] ata2.00: device reported > invalid CHS sector 0 > Aug 9 21:50:40 cetacea kernel: [177610.282890] ata2.00: device reported > invalid CHS sector 0 > Aug 9 21:50:40 cetacea kernel: [177610.282896] ata2: EH complete > > Intel had a mess with consequent NCQ command handling [1] on 3x10 and issued a firmware fix recently [2]. LSI controllers apparently has a different bug as people reporting bus resets which are different from the ones on C602. The firmware release fixed problem for me, same for complete NCQ disablement mentioned in the thread below. 1. https://communities.intel.com/thread/77801 2. https://downloadcenter.intel.com/download/23931/Intel-Solid-State-Drive-Data-Center-Tool _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com