Re: XFS and nobarriers on Intel SSD

Paul Mansfield <paul.mansfield@xxxxxxxxxxxxxxxxxx> · Mon, 7 Sep 2015 10:54:55 +0100

On 04/09/15 20:55, Richard Bade wrote:
> We have a Ceph pool that is entirely made up of Intel S3700/S3710
> enterprise SSD's.
> 
> We are seeing some significant I/O delays on the disks causing a “SCSI
> Task Abort” from the OS. This seems to be triggered by the drive
> receiving a “Synchronize cache command”.

I've heard from other sources that the new Intel 3610 and 3710 have been
afflicted by a bug, possibly now fixed with new firmware, that might be
the cause of your problem.
The person who first reported it said that they upgraded from 3600 units
and never had a problem but started seeing issues with 3610 model.

When they look at their log they see this

Aug  9 21:50:39 cetacea kernel: [177609.957939] ata2.00: exception Emask
0x0 SAct 0x6000 SErr 0x0 action 0x6 frozen
Aug  9 21:50:39 cetacea kernel: [177609.958480] ata2.00: failed command:
READ FPDMA QUEUED
Aug  9 21:50:39 cetacea kernel: [177609.958995] ata2.00: cmd
60/00:68:00:08:db/04:00:0a:00:00/40 tag 13 ncq 524288 in
Aug  9 21:50:39 cetacea kernel: [177609.958995]          res
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  9 21:50:39 cetacea kernel: [177609.960074] ata2.00: status: { DRDY }
Aug  9 21:50:39 cetacea kernel: [177609.960628] ata2.00: failed command:
READ FPDMA QUEUED
Aug  9 21:50:39 cetacea kernel: [177609.961198] ata2.00: cmd
60/f0:70:00:0c:db/00:00:0a:00:00/40 tag 14 ncq 122880 in
Aug  9 21:50:39 cetacea kernel: [177609.961198]          res
40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  9 21:50:39 cetacea kernel: [177609.962405] ata2.00: status: { DRDY }
Aug  9 21:50:39 cetacea kernel: [177609.963001] ata2: hard resetting link
Aug  9 21:50:40 cetacea kernel: [177610.281881] ata2: SATA link up 6.0
Gbps (SStatus 133 SControl 300)
Aug  9 21:50:40 cetacea kernel: [177610.282865] ata2.00: configured for
UDMA/133
Aug  9 21:50:40 cetacea kernel: [177610.282887] ata2.00: device reported
invalid CHS sector 0
Aug  9 21:50:40 cetacea kernel: [177610.282890] ata2.00: device reported
invalid CHS sector 0
Aug  9 21:50:40 cetacea kernel: [177610.282896] ata2: EH complete

If you're running your SSDs behind a RAID controller you might not be
able to get these diagnostics, it may simply seem that the drive has
glitched.

they later reported:

> So as far as I can determine, 100% of s3610 and s3710 SSDs were
> shipped broken and now Intel will fix it in firmware without
> acknowledging that there ever was a problem.
>  https://communities.intel.com/thread/77801?start=0&tstart=0

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com