Re: XFS and nobarriers on Intel SSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,
Thanks for the info. I'm just wondering, have you updated your S3610's with the new firmware that was released on 21/08 as referred to in the thread?
We thought we weren't seeing the issue on the intel controller also to start with, but after further investigation it turned out we were, but it was reported as a different log item such as this:
ata5.00: exception Emask 0x0 SAct 0x300000 SErr 0x0 action 0x6 frozen
ata5.00: failed command: READ FPDMA QUEUED
ata5.00: cmd 60/10:a0:18:ca:ca/00:00:32:00:00/40 tag 20 ncq 8192 in
          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5.00: failed command: READ FPDMA QUEUED
ata5.00: cmd 60/40:a8:48:ca:ca/00:00:32:00:00/40 tag 21 ncq 32768 in
         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata5.00: configured for UDMA/133
ata5.00: device reported invalid CHS sector 0
ata5.00: device reported invalid CHS sector 0
ata5: EH complete
ata5.00: Enabling discard_zeroes_data

I believe this to be the same thing as the LSI3008 which gives these log messages:
sd 0:0:6:0: attempting task abort! scmd(ffff8804cac00600)
sd 0:0:6:0: [sdg] CDB: 
Read(10): 28 00 1c e7 76 a0 00 01 30 00
scsi target0:0:6: handle(0x000f), sas_address(0x4433221106000000), phy(6)
scsi target0:0:6: enclosure_logical_id(0x5003048000000000), slot(6)
sd 0:0:6:0: task abort: SUCCESS scmd(ffff8804cac00600)
sd 0:0:6:0: attempting task abort! scmd(ffff8804cac03780)

I appreciate your info with regards to nobarries. I assume by "alleviate it, but didn't fix" you mean the number of occurrences is reduced?

Regards,
Richard


On 8 September 2015 at 11:43, Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

Note that I see exactly your errors (in a non-Ceph environment) with both
Samsung 845DC EVO and Intel DC S3610.
Though I need to stress things quite a bit to make it happen.

Also setting nobarrier did alleviate it, but didn't fix it 100%, so I
guess something still issues flushes at some point.

>From where I stand LSI/Avago are full of it.
Not only does this problem NOT happen with any onboard SATA chipset I have
access to, their task abort and reset is what actually impacts things
(several seconds to recover), not whatever insignificant delay caused by
the SSDs.

Christian
On Tue, 8 Sep 2015 11:35:38 +1200 Richard Bade wrote:

> Thanks guys for the pointers to this Intel thread:
>
> https://communities.intel.com/thread/77801
>
> It looks promising. I intend to update the firmware on disks in one
> node tonight and will report back after a few days to a week on my
> findings.
>
> I've also posted to that forum and will update there too.
>
> Regards,
>
> Richard
>
>
> On 5 September 2015 at 07:55, Richard Bade <hitrich@xxxxxxxxx> wrote:
>
> > Hi Everyone,
> >
> > We have a Ceph pool that is entirely made up of Intel S3700/S3710
> > enterprise SSD's.
> >
> > We are seeing some significant I/O delays on the disks causing a “SCSI
> > Task Abort” from the OS. This seems to be triggered by the drive
> > receiving a “Synchronize cache command”.
> >
> > My current thinking is that setting nobarriers in XFS will stop the
> > drive receiving a sync command and therefore stop the I/O delay
> > associated with it.
> >
> > In the XFS FAQ it looks like the recommendation is that if you have a
> > Battery Backed raid controller you should set nobarriers for
> > performance reasons.
> >
> > Our LSI card doesn’t have battery backed cache as it’s configured in
> > HBA mode (IT) rather than Raid (IR). Our Intel s37xx SSD’s do have a
> > capacitor backed cache though.
> >
> > So is it recommended that barriers are turned off as the drive has a
> > safe cache (I am confident that the cache will write out to disk on
> > power failure)?
> >
> > Has anyone else encountered this issue?
> >
> > Any info or suggestions about this would be appreciated.
> >
> > Regards,
> >
> > Richard
> >


--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux