Re: XFS and nobarriers on Intel SSD

Christian Balzer <chibi@xxxxxxx> · Tue, 8 Sep 2015 11:02:28 +0900

Hello,

On Tue, 8 Sep 2015 13:40:36 +1200 Richard Bade wrote:

> Hi Christian,
> Thanks for the info. I'm just wondering, have you updated your S3610's
> with the new firmware that was released on 21/08 as referred to in the
> thread? 
I did so earlier today, see below.

>We thought we weren't seeing the issue on the intel controller
> also to start with, but after further investigation it turned out we
> were, but it was reported as a different log item such as this:
> ata5.00: exception Emask 0x0 SAct 0x300000 SErr 0x0 action 0x6 frozen
> ata5.00: failed command: READ FPDMA QUEUED
> ata5.00: cmd 60/10:a0:18:ca:ca/00:00:32:00:00/40 tag 20 ncq 8192 in
>           res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata5.00: status: { DRDY }
> ata5.00: failed command: READ FPDMA QUEUED
> ata5.00: cmd 60/40:a8:48:ca:ca/00:00:32:00:00/40 tag 21 ncq 32768 in
>          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata5.00: status: { DRDY }
> ata5: hard resetting link
> ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata5.00: configured for UDMA/133
> ata5.00: device reported invalid CHS sector 0
> ata5.00: device reported invalid CHS sector 0
> ata5: EH complete
> ata5.00: Enabling discard_zeroes_data
> 
Didn't see any of these, but admittedly I tested this with fewer SSDs on
the onboard controller and with fio/bonnie++, which do not trigger that
behavior as easily.

> I believe this to be the same thing as the LSI3008 which gives these log
> messages:
> sd 0:0:6:0: attempting task abort! scmd(ffff8804cac00600)
> sd 0:0:6:0: [sdg] CDB:
> Read(10): 28 00 1c e7 76 a0 00 01 30 00
> scsi target0:0:6: handle(0x000f), sas_address(0x4433221106000000), phy(6)
> scsi target0:0:6: enclosure_logical_id(0x5003048000000000), slot(6)
> sd 0:0:6:0: task abort: SUCCESS scmd(ffff8804cac00600)
> sd 0:0:6:0: attempting task abort! scmd(ffff8804cac03780)
> 
Yup, I know that message all too well.

> I appreciate your info with regards to nobarries. I assume by "alleviate
> it, but didn't fix" you mean the number of occurrences is reduced?
> 
Indeed. But first a word about the setup where I'm seeing this.
These are 2 mailbox server clusters (2 nodes each), replicating via DRBD
over Infiniband (IPoIB at this time), LSI 3008 controller. One cluster
with the Samsung DC SSDs, one with the Intel S3610.
2 of these chassis to be precise:
https://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DC0FR.cfm

Of course latest firmware and I tried this with any kernel from Debian
3.16 to stock 4.1.6. 

With nobarrier I managed to trigger the error only once yesterday on the
DRBD replication target, not the machine that actual has the FS mounted.
Usually I'd be able to trigger quite a bit more often during those tests.

So this morning I updated the firmware of all S3610s on one node and
removed the nobarrier flag. It took a lot of punishment, but eventually
this happened:
---
Sep  8 10:43:47 mbx09 kernel: [ 1743.358329] sd 0:0:1:0: attempting task abort! scmd(ffff880fdc85b680)
Sep  8 10:43:47 mbx09 kernel: [ 1743.358339] sd 0:0:1:0: [sdb] CDB: Write(10) 2a 00 0e 9a fb b8 00 00 08 00
Sep  8 10:43:47 mbx09 kernel: [ 1743.358345] scsi target0:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
Sep  8 10:43:47 mbx09 kernel: [ 1743.358348] scsi target0:0:1: enclosure_logical_id(0x5003048019e98d00), slot(1)
Sep  8 10:43:47 mbx09 kernel: [ 1743.387951] sd 0:0:1:0: task abort: SUCCESS scmd(ffff880fdc85b680)
---
Note that on the un-patched node (DRBD replication target) I managed to
trigger this bug 3 times in the same period.

So unless Intel has something to say (and given that this happens with
Samsungs as well), I'd still look beady eyed at LSI/Avago...

Christian
> Regards,
> Richard
> 
> 
> On 8 September 2015 at 11:43, Christian Balzer <chibi@xxxxxxx> wrote:
> 
> >
> > Hello,
> >
> > Note that I see exactly your errors (in a non-Ceph environment) with
> > both Samsung 845DC EVO and Intel DC S3610.
> > Though I need to stress things quite a bit to make it happen.
> >
> > Also setting nobarrier did alleviate it, but didn't fix it 100%, so I
> > guess something still issues flushes at some point.
> >
> > From where I stand LSI/Avago are full of it.
> > Not only does this problem NOT happen with any onboard SATA chipset I
> > have access to, their task abort and reset is what actually impacts
> > things (several seconds to recover), not whatever insignificant delay
> > caused by the SSDs.
> >
> > Christian
> > On Tue, 8 Sep 2015 11:35:38 +1200 Richard Bade wrote:
> >
> > > Thanks guys for the pointers to this Intel thread:
> > >
> > > https://communities.intel.com/thread/77801
> > >
> > > It looks promising. I intend to update the firmware on disks in one
> > > node tonight and will report back after a few days to a week on my
> > > findings.
> > >
> > > I've also posted to that forum and will update there too.
> > >
> > > Regards,
> > >
> > > Richard
> > >
> > >
> > > On 5 September 2015 at 07:55, Richard Bade <hitrich@xxxxxxxxx> wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > We have a Ceph pool that is entirely made up of Intel S3700/S3710
> > > > enterprise SSD's.
> > > >
> > > > We are seeing some significant I/O delays on the disks causing a
> > > > “SCSI Task Abort” from the OS. This seems to be triggered by the
> > > > drive receiving a “Synchronize cache command”.
> > > >
> > > > My current thinking is that setting nobarriers in XFS will stop the
> > > > drive receiving a sync command and therefore stop the I/O delay
> > > > associated with it.
> > > >
> > > > In the XFS FAQ it looks like the recommendation is that if you
> > > > have a Battery Backed raid controller you should set nobarriers for
> > > > performance reasons.
> > > >
> > > > Our LSI card doesn’t have battery backed cache as it’s configured
> > > > in HBA mode (IT) rather than Raid (IR). Our Intel s37xx SSD’s do
> > > > have a capacitor backed cache though.
> > > >
> > > > So is it recommended that barriers are turned off as the drive has
> > > > a safe cache (I am confident that the cache will write out to disk
> > > > on power failure)?
> > > >
> > > > Has anyone else encountered this issue?
> > > >
> > > > Any info or suggestions about this would be appreciated.
> > > >
> > > > Regards,
> > > >
> > > > Richard
> > > >
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com