Re: XFS and nobarriers on Intel SSD

Maxime Guyot <Maxime.Guyot@xxxxxxxxx> · Thu, 3 Mar 2016 11:54:02 +0000

Hello,

It looks like this thread is one of the main google hit on this issue, so let me bring some update. I experienced the same symptoms with Intel S3610 and LSI2208.

The logs reported “task abort!” messages on a daily basis since November:

Write(10): 2a 00 0e 92 88 90 00 00 10 00
scsi target6:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
scsi target6:0:1: enclosure_logical_id(0x500304801c84e000), slot(2)
sd 6:0:1:0: task abort: SUCCESS scmd(ffff8805b30fa200)
sd 6:0:1:0: attempting task abort! scmd(ffff8807ef9e9800)
sd 6:0:1:0: [sdf] CDB:

OSD would go down from time to time with:

XFS (sdf3): xfs_log_force: error 5 returned.
lost page write due to I/O error on sdf3

I was able to repeat “task abort!” messages with "rados -p data bench 30 write -b 1048576”. The OSD down and XFS errors on the other hand were harder to reproduce systemically. 
To solve the problem I followed Christian’s recommendation to update the S3610 SSDs’ firmware from G2010110 to G2010140 using the

isdct utility. It was easy to convert the RPM package released by Intel into a .deb package using “alien”. Then just a matter of “isdct show –intelssd” and “isdct load –intelssd 0"

It has been a week since the cluster runs with the latest firmware, I can’t reproduce the problem so it looks like the issue is solved.

Thank you Christian for the info!

Regards 

Maxime Guyot 

System Engineer

> Hello,
> 
> On Tue, 8 Sep 2015 13:40:36 +1200 Richard Bade wrote:
> 
> > Hi Christian,
> > Thanks for the info. I'm just wondering, have you updated your S3610's
> > with the new firmware that was released on 21/08 as referred to in the
> > thread? 
> I did so earlier today, see below.
> 
> >We thought we weren't seeing the issue on the intel controller
> > also to start with, but after further investigation it turned out we
> > were, but it was reported as a different log item such as this:
> > ata5.00: exception Emask 0x0 SAct 0x300000 SErr 0x0 action 0x6 frozen
> > ata5.00: failed command: READ FPDMA QUEUED
> > ata5.00: cmd 60/10:a0:18:ca:ca/00:00:32:00:00/40 tag 20 ncq 8192 in
> >           res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> > ata5.00: status: { DRDY }
> > ata5.00: failed command: READ FPDMA QUEUED
> > ata5.00: cmd 60/40:a8:48:ca:ca/00:00:32:00:00/40 tag 21 ncq 32768 in
> >          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> > ata5.00: status: { DRDY }
> > ata5: hard resetting link
> > ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > ata5.00: configured for UDMA/133
> > ata5.00: device reported invalid CHS sector 0
> > ata5.00: device reported invalid CHS sector 0
> > ata5: EH complete
> > ata5.00: Enabling discard_zeroes_data
> > 
> Didn't see any of these, but admittedly I tested this with fewer SSDs on
> the onboard controller and with fio/bonnie++, which do not trigger that
> behavior as easily.
> 
> > I believe this to be the same thing as the LSI3008 which gives these log
> > messages:
> > sd 0:0:6:0: attempting task abort! scmd(ffff8804cac00600)
> > sd 0:0:6:0: [sdg] CDB:
> > Read(10): 28 00 1c e7 76 a0 00 01 30 00
> > scsi target0:0:6: handle(0x000f), sas_address(0x4433221106000000), phy(6)
> > scsi target0:0:6: enclosure_logical_id(0x5003048000000000), slot(6)
> > sd 0:0:6:0: task abort: SUCCESS scmd(ffff8804cac00600)
> > sd 0:0:6:0: attempting task abort! scmd(ffff8804cac03780)
> > 
> Yup, I know that message all too well.
> 
> > I appreciate your info with regards to nobarries. I assume by "alleviate
> > it, but didn't fix" you mean the number of occurrences is reduced?
> > 
> Indeed. But first a word about the setup where I'm seeing this.
> These are 2 mailbox server clusters (2 nodes each), replicating via DRBD
> over Infiniband (IPoIB at this time), LSI 3008 controller. One cluster
> with the Samsung DC SSDs, one with the Intel S3610.
> 2 of these chassis to be precise:
> https://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DC0FR.cfm
> 
> Of course latest firmware and I tried this with any kernel from Debian
> 3.16 to stock 4.1.6. 
> 
> With nobarrier I managed to trigger the error only once yesterday on the
> DRBD replication target, not the machine that actual has the FS mounted.
> Usually I'd be able to trigger quite a bit more often during those tests.
> 
> So this morning I updated the firmware of all S3610s on one node and
> removed the nobarrier flag. It took a lot of punishment, but eventually
> this happened:
> ---
> Sep  8 10:43:47 mbx09 kernel: [ 1743.358329] sd 0:0:1:0: attempting task abort! scmd(ffff880fdc85b680)
> Sep  8 10:43:47 mbx09 kernel: [ 1743.358339] sd 0:0:1:0: [sdb] CDB: Write(10) 2a 00 0e 9a fb b8 00 00 08 00
> Sep  8 10:43:47 mbx09 kernel: [ 1743.358345] scsi target0:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
> Sep  8 10:43:47 mbx09 kernel: [ 1743.358348] scsi target0:0:1: enclosure_logical_id(0x5003048019e98d00), slot(1)
> Sep  8 10:43:47 mbx09 kernel: [ 1743.387951] sd 0:0:1:0: task abort: SUCCESS scmd(ffff880fdc85b680)
> ---
> Note that on the un-patched node (DRBD replication target) I managed to
> trigger this bug 3 times in the same period.
> 
> So unless Intel has something to say (and given that this happens with
> Samsungs as well), I'd still look beady eyed at LSI/Avago...
> 
> Christian
> > Regards,
> > Richard
> > 
> > 
> > On 8 September 2015 at 11:43, Christian Balzer <chibi at gol.com> wrote:
> > 
> > >
> > > Hello,
> > >
> > > Note that I see exactly your errors (in a non-Ceph environment) with
> > > both Samsung 845DC EVO and Intel DC S3610.
> > > Though I need to stress things quite a bit to make it happen.
> > >
> > > Also setting nobarrier did alleviate it, but didn't fix it 100%, so I
> > > guess something still issues flushes at some point.
> > >
> > > From where I stand LSI/Avago are full of it.
> > > Not only does this problem NOT happen with any onboard SATA chipset I
> > > have access to, their task abort and reset is what actually impacts
> > > things (several seconds to recover), not whatever insignificant delay
> > > caused by the SSDs.
> > >
> > > Christian
> > > On Tue, 8 Sep 2015 11:35:38 +1200 Richard Bade wrote:
> > >
> > > > Thanks guys for the pointers to this Intel thread:
> > > >
> > > > https://communities.intel.com/thread/77801
> > > >
> > > > It looks promising. I intend to update the firmware on disks in one
> > > > node tonight and will report back after a few days to a week on my
> > > > findings.
> > > >
> > > > I've also posted to that forum and will update there too.
> > > >
> > > > Regards,
> > > >
> > > > Richard
> > > >
> > > >
> > > > On 5 September 2015 at 07:55, Richard Bade <hitrich at gmail.com> wrote:
> > > >
> > > > > Hi Everyone,
> > > > >
> > > > > We have a Ceph pool that is entirely made up of Intel S3700/S3710
> > > > > enterprise SSD's.
> > > > >
> > > > > We are seeing some significant I/O delays on the disks causing a
> > > > > “SCSI Task Abort” from the OS. This seems to be triggered by the
> > > > > drive receiving a “Synchronize cache command”.
> > > > >
> > > > > My current thinking is that setting nobarriers in XFS will stop the
> > > > > drive receiving a sync command and therefore stop the I/O delay
> > > > > associated with it.
> > > > >
> > > > > In the XFS FAQ it looks like the recommendation is that if you
> > > > > have a Battery Backed raid controller you should set nobarriers for
> > > > > performance reasons.
> > > > >
> > > > > Our LSI card doesn’t have battery backed cache as it’s configured
> > > > > in HBA mode (IT) rather than Raid (IR). Our Intel s37xx SSD’s do
> > > > > have a capacitor backed cache though.
> > > > >
> > > > > So is it recommended that barriers are turned off as the drive has
> > > > > a safe cache (I am confident that the cache will write out to disk
> > > > > on power failure)?
> > > > >
> > > > > Has anyone else encountered this issue?
> > > > >
> > > > > Any info or suggestions about this would be appreciated.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Richard
> > > > >
> > >
> > >
> > > --
> > > Christian Balzer        Network/Systems Engineer
> > > chibi at gol.com           Global OnLine Japan/Fusion Communications
> > > http://www.gol.com/
> > >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com